Ertuğrul Mikail Tunç, abjz066
Ertuğrul Mikail Tunç, login: abjz066, Supervisor: Dr. Ilir Gashi 2012-2013
City University London
Information Systems BSc (Hons) Final Year Project Report
Academic Year: 2012-13
A Comparative Analysis of the Hardware and
Software Component Performance on a Virtualised
Environment Vs. a Physical Environment
By
Ertuğrul Mikail Tunç
Project supervisor: Dr. Ilir Gashi
5 August 2013
Project Report Document
Version
Version Author Date Description
0.1 Ertuğrul Mikail Tunç 03/03/2013 Initial document skeleton
0.2 Ertuğrul Mikail Tunç 06/05/2013 Introduction
0.3 Ertuğrul Mikail Tunç 14/05/2013 Literature review
0.4 Ertuğrul Mikail Tunç 01/07/2013 Method
0.5 Ertuğrul Mikail Tunç 08/07/2013 –
17/07/2013
Database Analysis
0.6 Ertuğrul Mikail Tunç 18/07/2013 –
21/07/2013
Web Analysis
0.7 Ertuğrul Mikail Tunç 27/07/2013 –
31/07/2013
Improved presentation and readability of document, improved quality of data in results section
0.8 Ertuğrul Mikail Tunç 01/08/2013 –
02/08/2013
Conclusion
0.9 Ertuğrul Mikail Tunç 03/08/2013 Final review; minor changes
1.0 Ertuğrul Mikail Tunç 04/08/2013 Final Piece
Project Report Document
Contents
A Comparative Analysis of the Hardware and Software Component Performance on a
Virtualised Environment Vs. a Physical Environment ................................................. 1
Contents .............................................................................................................. 2
1 Introduction ..................................................................................................... 3
2 Outputs Summary ............................................................................................. 6
3 Literature Review.............................................................................................. 7
4 Method ........................................................................................................... 9
4.1 Analysis .................................................................................................... 9
4.2 Design and Implementation .......................................................................... 10
4.3 Evaluation ................................................................................................ 18
5 Results ........................................................................................................... 19
5.1 Database Test Results ................................................................................. 20
5.2 Summary .................................................................................................. 30
5.3 Web Test Results ....................................................................................... 31
5.4 Summary .................................................................................................. 39
6 Conclusions and Discussions .............................................................................. 40
6.1 Introduction .............................................................................................. 40
6.2 Project Objectives and Research Questions ...................................................... 40
6.3 General Conclusions ................................................................................... 40
6.4 Implications for Future Work ........................................................................ 42
6.5 Project Management ................................................................................... 42
7 Glossary ........................................................................................................ 43
8 Appendices .................................................................................................... 44
Project Report Document
1 Introduction
Virtualisation is one of the hot topics when it comes to Information Technology. Businesses
are moving to virtualised technologies at an exceedingly alarming rate. Previously, only large
corporations would have the cash-flow and technical expertise to implement a virtualised
platform; and even then, only servers or desktops would be virtualised.
Today, the influx of virtualisation software, services and hardware means that even small and
medium sized businesses are taking advantage of virtualisation. Software developers are
pushing their mobile (take Instagram as an example), desktop (the popular file syncing
application Dropbox uses Amazon S3 virtualised storage to store user files) web applications
(easyJet’s seating service utilises Microsoft’s Windows Azure virtualisation technology) and
much, much more.
Gartner research from 2012 shows that “virtualisation penetration has surpassed 50% of all
server workloads”1
Other research and analysis undertaken by Guy Rosen shows that the number of instance
requests for the Amazon EC2 service for a single region in a 24 hour period was
approximately 50,0002. This is indeed a staggering number of instance requests and although
the analysis is based on black-box observations (Amazon does not make this information
publicly available), I would not be surprised if the numbers were not far off from the truth.
These trends indicate that virtualisation is only going to gain more ground and penetrate more
audiences. By this I mean that it will not be limited to highly specialised technical personnel
as it was many years ago; it will be available to people who are not very technical (technical
with respect to virtualisation and hardware operations) who are, for example, looking to
deploy a database instance on the ‘cloud’ somewhere so their mobile app or web service can
interact with. The images in Appendix A paints this shift from the requirements of using this
1 Magic Quadrant for x86 Server Virtualization Infrastructure. 2013. Magic Quadrant for x86 Server Virtualization Infrastructure. [ONLINE] Available at: http://www.gartner.com/technology/reprints.do?id=1-1AVRXJL&ct=120612. [Accessed 08 April 2013]. 2 Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds :: Guy Rosen on Cloud Computing. 2013. Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds :: Guy Rosen on Cloud Computing. [ONLINE] Available at: http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/. [Accessed 09 April 2013].
Project Report Document
technology being highly technical to not very technical; it shows that the average John Doe
can come along and shift a few sliders here and there in order to deploy a virtual environment.
The reason I mention all of this is to make it very clear that virtualisation and cloud (the two
go hand-in-hand) is becoming more and more prevalent and we are beginning to see a shift of
business users demanding virtualisation without necessarily understanding the technical
advantages and disadvantages of the technology; it’s just seen as the next ‘cool’ thing that
‘everyone else is doing’.
The question I will answer in this final year project dissertation is one of performance;
whether workloads (such as a SQL database and a web application) will be affected by putting
them on a virtualised platform. For example, will web application X perform any better or
worse when it is on the physical hardware as compared to on a virtualised hypervisor?
As there are a countless number of scenarios which could be tested with regards to this
hypothesis, I will be limiting the scope of my research in a way that will mean keeping the
configuration static but making the results and analysis as comparative as possible (i.e.,
installing the exact same testing software, running the exact same tests, keeping the
configuration between the virtual and physical OS as similar as possible, etc)
The main objectives of this research project is to compare the performance of different
workloads on both physical and virtual infrastructure and conclude with sufficient evidence
which infrastructure performs better under the different workloads and by how much (in some
measurable form). I aim to do this by:
o Running a variety of tests to try and achieve a realistic approach to testing. These tests are
documented fully in the test plan references in Appendix B. This test plan contains all of
the tests I plan to run along with configurations of the tests, number of users, test types and
definitions.
These tests will be run from a ‘load injector’; a (LAN) server I will build which will be
running the tests against the virtual and physical servers.
o The different types of applications I will be running my tests against are as follows:
SQL Database
Project Report Document
Web application
o I shall collect performance data using monitoring tools such as:
PerfMon (Windows performance monitoring utility)
ESXi performance monitoring metrics (host level metrics)
The data from these tools are to be collected for the duration of the tests that are run.
Analysis of the data will consist of the following: Descriptive comparisons of the main software
and hardware component metrics such as CPU utilisation (hardware), SQL database
transactions (application). These will be well described, graphed and put in tables after having
been analysed. The two servers (virtual and physical) will be closely compared so every test
analysed should have a minimum of two sets of data.
Underlying causes and hypothesis will be observed as to why (if any) inconsistencies occur in
test results (e.g., if a test against a SQL DB consumes 100% in test A but only 25% in test B, I
will thoroughly investigate the causes behind test A’s results and present my theories to the
reader).
Project Report Document
2 Outputs Summary
In this section I will briefly go over the outputs that have been produced.
Test Plan
o The test plan briefly defines the definitions of test terms as well as describing the
test cases
o The output is a test plan document
o The recipient/end-user will be anyone who would like to scrutinise the research
and/or carry out similar research
o Recipients could use this research to make informed decisions as to what type of
technology to use when deploying an application and/or server.
o The test plan can be found in Appendix B
Analysis of the tests
o One of the main outputs of this research project is the analysis of the tests run.
o The output is in the form of graphs and supporting text
o The recipient/end-user will be anyone who is interested and/or researching the
field
o Recipients could use this research to make informed decisions as to what type of
technology to use when deploying an application and/or server.
o The analysis can be found in section 5 – Results
o The main findings are that the Physical machine out performs the virtual
machines in every test and that the virtual machine which is over-
committed on physical resources does significantly worse than both the
physical and virtual machines.
o The physical machine also had a higher throughput rate than both virtual
machines in all tests.
Project Report Document
3 Literature Review
This chapter summarises the reading undertaken in order to enable me to complete the project.
VMware performance benchmarking technical documentation (VMware, n.d.)
.
VMware performance benchmarking documentation provided the technical
information about how test beds were set-up as well as how performance was
monitored. For example, The Zimbra Collaboration Server Performance on VMware
vSphere 5 documentation outlines the methodology and how the performance test
environment is set-up. I used some of the ideas from here; e.g., keeping the
environments as similar as possible in both native and virtual environments to make
results as valid as possible
Also, the VMware documentation clearly states the exact configuration used in the
tests. I then incorporated this in to this research project by stating as much of the
configuration as I could.
JMeter user manual (The Apache Software Foundation, n.d.)
JMeter is a free, open-source performance testing tool developed in Java. It is widely
used in performance testing circles and allows a performance tester to be very flexible
with the creation of test cases and scenarios. It allows performance testing of many
known standards such as HTTP, TCP, VT100 terminals, JDBC etc.
As there is a lot to know about the configuration and set-up of the scenarios, I often
referred back to the JMeter user manual to learn what a certain configuration or option
performed.
Project Report Document
Microsoft TechNet library – Performance Monitor Getting Started Guide for Windows Server
2008 (Microsoft, 2007)
Performance Monitor is a native in-built tool in all modern versions of Windows. It
allows you to define performance metrics to monitor on a system, log that data to a file
via a schedule and output that data to a csv file. There are hundreds of performance
metrics to choose from so I decided to choose the main ones like CPU, disk, memory
and application specific metrics such as SQL processor time.
The TechNet library has an extensive list of the performance monitoring metrics and
lists the format and how they are collected. For example, some metrics are averaged
out over the collection interval (15 seconds by default) but some are just ‘snapshots’ of
the performance data. For analysis purposes, it is important to understand how the data
is formatted otherwise there could be questions over the validity of the results.
Frontiers of High Performance Computing and Networking – ISPA 2006 Workshops (Min, Di
Martino, Yang, Guo, & Ruenger, 2006)
This book studies the performance and scalability of compute intensive commercial
server workloads (specifically a Java server workload benchmark called
SPECjbb2005). Many of my methodologies were derived from this book; for example,
how tests were set-up, what type of data to log, what types of tests to run. The below
text extract is the methodology used by the authors – it is very similar to the
methodology I used. i.e., employ a platform running the VMware hypervisor, run
performance monitoring tools, run the tests on the native and virtual platforms and then
analyse the results.
“Our measurement-based methodology employs an Intel Xeon platform running with
[…] We run out workloads on the Xen hypervisor and employ several performance
tools […] to extract overall system performance […] and architectural behaviour […].
For comparison, we also measure the performance of SPECjbb2006 in a native (non-
virtualized) platform. We expect that the findings from out [sic] study will be useful to
VMM architects and platform architects as it provides insight into performance
bottlenecks” (page 465)
Project Report Document
4 Method
4.1 Analysis
I have worked in the performance testing and capacity management field for just over 3 years
and a trend I have noticed is that many organisations are moving their applications from
traditional, physical based infrastructure to virtualised environments such as Amazon’s AWS
and Microsoft’s Windows Azure. Elastic capacity, easier management over product
development and pricing are usually the key points for moving from physical to virtual.
IT savvy people in general have always stated that virtualisation would have an overhead over
physical and throw a percentage in the air (one that I hear often is 5-10%) but we would never
have any up-to-date technical analysis to back these figures or statements up.
Therefore, for this research project I decided to design test beds, load test scenarios and
performance monitoring in order to analyse the data and see if I could quantify virtualisation
overhead on more modern IT infrastructure (infrastructure being more modern CPU
architecture) and software (software being the virtualisation hypervisor).
During my analysis of the field of virtualisation and my own personal experiences I learnt that
there are many variations of hardware set-ups, configurations, types of workload (for
example, a storage server or application that serves mathematical intensive transactions) and
workload configurations for me to possibly test in the given timeframe. Therefore a caveat of
this research is that the conclusion should be taken only as a guideline and performance
testing/benchmarking should be performed to fit one’s own environment and set-up.
Project Report Document
4.2 Design and Implementation
Three types of scenarios were designed for this research project. Scenario 1 is based on
testing against the physical infrastructure. Scenario 2 is based on testing against the virtual
infrastructure with no over-commitment and scenario 3 was inspired by how public cloud
providers maximise their hardware utilisation by using a virtualisation technology called
‘Over-commitment’ or in other words, over contention of the physical resources.
What this means in practice is that a physical resource can be shared by multiple virtual
machines; resulting in cheaper running costs for the cloud provider because they are
maximising utilisation on hardware already provisioned.
For example, let’s pretend I provide computing resources to customers and I currently have 1
physical machine with a Quad core processor and 8 GB of RAM. If a customer 1 decides they
want to put a 4 core, 8GB virtual machine on my host then they have used up 100% of the
physical capacity meaning no one else can put a VM on the same host. Realistically speaking,
it is unlikely that this customer will always use 100% of the processor time and 100% of the
memory allocation. This means there is potential for a lot of hardware waste.
This is where the over-commitment technology comes in (which most, if not all public cloud
providers use) which means that customer 2 can come along and provision a second VM with
(according to the how much over-commitment is allowed by the hypervisor which is set by
the Administrator) 2 cores and 4GB of RAM. This means that there is a higher likelihood of
hardware utilisation and less ‘waste’. Of course this also means there is more of a chance of
contention which would be a worst case scenario whereby customer 1 and customer 2 are both
highly utilised.
From a technical point of view this means hardware resource sharing which is performed by
the hypervisor which has a performance impact on the virtual machine.
Project Report Document
Each test within a scenario is run three times to account for variation from different runs.
Scenario 1 – Physical environment
Test 1 – Web Application Testing
Test 2 – Database Application Testing
Scenario 2 – Virtual Machine environment
Test 1 – Web Application Testing
Test 2 – Database Application Testing
Scenario 3 – Virtual Machine with over-commit environment
Test 1 – Web Application Testing
Test 2 – Database Application Testing
I designed the test scenarios for the web and database application using an open-source,
popular performance testing tool called Apache JMeter; as explained in the Literature Review
section.
The test scenarios will be available as a .jmx file as part of this research.
The test plan (Appendix B) details the two workloads I will be testing, the type of tests I will
be running along with the test terminology.
Project Report Document
Database scenario:
The database scenario consists of a number of typical database transactions across two tables.
SQLStess.Users and SQLStress.Orders.
The Orders table was pre-populated with 1 million rows of test data and the Users table was
left empty.
First I needed to prepare the database and tables. I created the database (named SQLStressn)
using the SQL Management Studio interface. I created the tables and fields using the SQL
script shown in Figure 4-1.
Figure 4-1
CREATE TABLE SQLStress.Users
(
Username varchar(64),
FirstName varchar(255),
LastName varchar(255),
UniqueId varchar(255),
Password binary(20)
)
CREATE TABLE SQLStress.Orders
(
Username varchar(64),
OrderName varchar(255),
OrderId varchar(64)
)
Then there was a need to populate the Orders table so that I could perform somewhat realistic
inner-joins between the two tables.
I ran the script in Figure 4-2 1 million times to generate the required data.
Figure 4-2
INSERT INTO SQLStress.Orders (Username,OrderName,OrderId)
VALUES ('${userName}','This is an order
description','${uniqueId}')
Project Report Document
The parameters ( ${parameterName} ) are passed through a .csv file. An example of the data
in the .csv file can be seen in Figure 4-3
Figure 4-3
userName,firstName,lastName,uniqueId,password
username1000001,firstname,lastname,1000001,username1000001
username1000002,firstname,lastname,1000002,username1000002
After this preparation of the database, there is the actual test scenario.
The important configuration to note can be seen in Figure 4-4 (the Literature Review section
provides a reference to the JMeter user manual which further explains the terms):
Figure 4-4
Configuration Value Notes
Number of threads (users) 50 The number of virtual users active in the test
Ramp-Up Period (in seconds) 10 How long before all threads will start running.
Ramping them up too fast can cause issues (for
example if you try to start 500 threads in 1
second on a web server, it will likely result
in a denial of service)
Constant delay (in
milliseconds)
500 Constant delay/wait time of 500 ms in between
transactions to make the throughput profile
more realistic. For example on a website it is
realistic for a user to wait a short time
before clicking on the next transaction.
Random delay (in
milliseconds)
500 The random delay goes hand-in-hand with the
constant delay above.
A random delay of up to 500 ms. This makes the
total delay between transactions between 500ms
(constant delay means the minimum wait time is
500ms) and 1000ms (1000ms being the maximum
wait time as the random can be a maximum of
500ms)
Duration of test (in
minutes)
60 Each test ran for a whole hour
Project Report Document
There are a total of 5 transactions per loop. So each thread will go through all 5 of the below
transactions (with the delay between each transaction) and start again.
Three SELECT statements
Figure 4-5
SELECT TOP 100 *
FROM ${table}
WHERE FirstName = 'firstname'
One INSERT statement
Figure 4-6
DECLARE @HashThis binary(20);
SELECT @HashThis = CONVERT(binary(20),'${password}');
INSERT INTO
SQLStress.Users(Username,FirstName,LastName,UniqueId,Pas
sword)
VALUES
('${userName}','${firstName}','${lastName}',${uniqueId},
HASHBYTES('SHA1', @HashThis))
One INNERJOIN statement
Figure 4-7
SELECT SQLStress.Orders.OrderId,
SQLStress.Users.Username
FROM SQLStress.Orders
INNER JOIN SQLStress.Users
ON SQLStress.Orders.Username= SQLStress.Users.Username
WHERE SQLStress.Users.Username = '${userName}'
Project Report Document
Web Application scenario:
The web application is a basic prime number generator created in ASP.NET 4. The code can
be seen in Appendix D.
In simple terms, when the user visits the page, they are served a static page with a text field.
In this field the user enters a number which defines how many prime numbers to generate.
The first step to load the static page is the GET request and the entering of the number and
sending that to the web server and getting a response back is the POST request.
The important configuration to note can be seen in Figure 4-8:
Figure 4-8
Configuration Value Notes
Number of threads (users) 50 The number of virtual users active in the test
Ramp-Up Period (in seconds) 10 How long before all threads will start running.
Ramping them up too fast can cause issues (for
example if you try to start 500 threads in 1
second on a web server, it will likely result
in a denial of service)
Constant delay (in
milliseconds)
500 Constant delay/wait time of 500 ms in between
transactions to make the throughput profile
more realistic. For example on a website it is
realistic for a user to wait a short time
before clicking on the next transaction.
Random delay (in
milliseconds)
500 The random delay goes hand-in-hand with the
constant delay above.
A random delay of up to 500 ms. This makes the
total delay between transactions between 500ms
(constant delay means the minimum wait time is
500ms) and 1000ms (1000ms being the maximum
wait time as the random can be a maximum of
500ms)
Duration of test (in
minutes)
60 Each test ran for a whole hour
Project Report Document
Figure 4-9 and 4-10 show the two web transactions that were executed in the testing.
The GET transaction:
Figure 4-9
http://${server}/stress
Method: GET
The POST transaction:
Figure 4-10
http://${server}/stress
Method: POST
Parameters:
__VIEWSTATE: This is a dynamic value used by ASP.NET web pages
to persist changes to the state of web forms across post backs.
__EVENTVALIDATION: Ensures events raised on the client
originate from the controls rendered by ASP.NET
td1: 500: This is a variable defined in Default.aspx which
links to the number of prime numbers to generate
b1: Submit – button submit
Project Report Document
Infrastructure and Server Set-up:
The physical infrastructure runs on the following specifications:
Dell PowerEdge 1950 rack server
Two Intel Xeon E5345 Processors
Eight 1GB IBM RAM 667MHz ECC Buffered PC2-5300F FRU: 39M5784 P/N:
38L5903
Two 73.4 GB 10K RPM, IBM eServer ST973401SS, SAS drives
The virtual infrastructure runs on the following specifications:
VMware ESXi, 5.1.0, 799733
The virtual machines will run on the same underlying physical hardware as described
above
The reason why there are two SAS hard-drives is that one has Windows Server 2008 R2
installed whereas the other has VMware ESXi installed as the hypervisor with Windows
Server 2008 R2 installed as a virtual machine on that hypervisor. When I want to run a test on
the virtual or physical set-up, I simply swap the drives.
Figure 4-11 is a table of the server configurations during each scenario and test.
Figure 4-11
Scenario Test Configuration
Scenario 1 (Physical Machine) and
Scenario 2 (Virtual Machine)
Web Application and
Database Application
Two CPUs, 8 Cores, 8GB RAM
One Server 2008 R2 Enterprise
Scenario 3 (Virtual Machine with over-
commit)
Web Application and
Database Application
Two CPUs, 8 Cores, 8GB RAM
One Server 2008 R2 Enterprise
1 CPU, 4 Cores, 4GB RAM
One Ubuntu Desktop
Project Report Document
Database Application Set-up:
SQL Server 2008 R2 Enterprise
Default configuration
Web Application Set-up:
IIS 7.5
Default configuration
4.3 Evaluation
I completed all of the evaluation and analysis using Microsoft Excel. I used formulas, Pivot
tables and graphing to analyse and present the data in this research paper.
All of the data will be provided as part of the research so that there is opportunity to expand
on this research or customise the output to suit the reader.
Please see Appendix D for the location of these files.
Project Report Document
5 Results
In this section of the research project, I will discuss all of the data I have analysed.
The types of results I will show are as follows:
Database Test Results
o 90th, 95th and 99th Percentiles for Transaction Response Times (3 graphs for the
database transactions – SELECT, INSERT and INNERJOIN)
o Full distribution line graph of response times (3 graphs for the database
transactions – SELECT, INSERT and INNERJOIN)
o CPU (1 graph showing average CPU utilisation across the three environments)
o Memory (1 graph showing average working set for the SQL Server process for
all three environments)
o Disk (1 graph showing disk seconds per write across the three environments)
Web Test Results
o 90th, 95th and 99th Percentiles for Transaction Response Times (2 graphs for the
web transactions – GET and POST)
o Full distribution line graph of response times (2 graphs for the web transactions
– GET and POST)
o CPU (1 graph showing average CPU utilisation across the three environments)
o Memory (1 graph showing average working set for the W3WP process for all
three environments)
o Disk (1 graph showing disk seconds per write across the three environments)
Project Report Document
5.1 Database Test Results
The first set of results are percentiles of each individual database transaction response time.
E.g., 90% of users will see a transaction response time of x milliseconds for the INNERJOIN
database transaction.
As per Figure 5-1, the response times for the INNERJOIN SQL transaction show that the
Physical database server performs slightly faster than the virtual and significantly faster than
the over-committed virtual machine.
Across the percentiles, the transactions on the virtual environment are 4.6% slower than the
physical and the transactions on the over-committed virtual machine is 33.8% slower than the
physical.
Figure 5-1
Project Report Document
In Figure 5-2, we can see the average response times of the INNERJOIN statement throughout
the duration of the test (60 minutes).
The pattern we observe here is similar to the one in Figure 5-1 where the Physical machine
performs better than the other two.
We also see here that the behaviour of the over-committed virtual machine is more erratic when
compared to the other two which seem more stable.
This behaviour can be explained by CPU scheduling on the hypervisor due to contention for
the physical resources.
Figure 5-2
Project Report Document
Figure 5-3 indicates that the response times for the SELECT SQL transaction follow a similar
pattern whereby the Physical performs the fastest, followed by the virtual and over-committed
virtual.
The average response time for the 90th and 95th percentile show that the virtual machine is
approximately 6% slower than the physical machine.
Analysis also shows that the over-committed virtual machine is 50.6% slower than the
physical machine.
Figure 5-3
Project Report Document
In Figure 5-4, we see consistently higher response times for both virtual machines.
Again we can observe the erratic behaviour of the over-committed virtual machine.
Figure 5-4
Project Report Document
Figure 5-5 show the response times for the INSERT SQL transaction. Again a similar pattern
emerges whereby the virtual machine is on average 5.7% slower on the 90th and 95th
percentiles.
An unexpected result here is that the 99th percentile stats show that the virtual machine is 1
millisecond faster than the physical machine however the difference is too low to be of any
significance.
The over-committed virtual machine is consistently slower than the physical machine at an
average of 46% slower.
Figure 5-5
In Figure 5-6, we see consistently higher response times for both virtual machines. Again there
is erratic behaviour of the over-committed virtual machine.
Project Report Document
Figure 5-6
Figure 5-7, 5-8 and 5-9 show the transactions per minute of each SQL transaction. As the
response times in the graphs above indicate that Physical performs better, virtual slightly
worse and over-committed virtual significantly worse, we would expect to see this reflected in
the transaction throughput. i.e., if the transaction is quick to complete then we should see
more of them. If the transaction is slow then we should see less of them due to longer
processing time.
This is exactly what we see here where the physical exceeds in transaction throughput only
slightly over the virtual machine and significantly exceeds throughput as compared to the
over-committed virtual machine.
Average calculations across the three SQL transactions show that the virtual machine
executes 1.4-1.5% less transaction throughput per minute and the over-committed virtual
machine executes 9.9% less transaction throughput per minute.
Project Report Document
This means that for a system which pushes through high volumes of transactions, the
performance and throughput of these transactions can be impacted by using virtualisation and
public cloud technologies.
Figure 5-7
Figure 5-8
Project Report Document
Figure 5-9
Figure 5-10 indicates that CPU utilisation on all three environments are almost exactly the same –
approximately 372-374% CPU utilisation average (out of 800% because the systems utilise 8 cores).
Figure 5-10
Project Report Document
The data from Figure 5-11 was taken from vSphere. This data is collected directly from the hypervisor
and shows us the CPU utilisation (out of 100%) for each object.
As per the table in the Method section (Scenario 3), we know that the hardware allocation was as
follows:
Over-committed VM = 2 CPUs, 8 Cores, 8GB RAM
Ubuntu Desktop = 1 CPU, 4 Cores, 4GB RAM
To briefly explain how to interpret the graph I will go through the interpretation of the Windows
Server 2008 R2 virtual machine. We can see that the average CPU usage of the Windows Server 2008
R2 server was running at an average of 44% utilised; this means 44% utilisation out of the 2 CPUs
allocated. This makes sense because we can see on the previous page that the CPU utilisation was
approximately 374% out of 800%.
The important note to take from this graph is the average CPU utilisation of the Ubuntu virtual
machine as it tells us how much contention was present during the test.
The figure is 57% utilised which equates to 28.5% over-contention on the test Windows Server VM.
Figure 5-11
Project Report Document
Figure 5-12 shows the average SQL server working set. There is nothing unusual to report here as the
memory of the three servers are between 1 and 1.2GB.
Figure 5-12
Figure 5-13 shows us the disk seconds per write. This tells us how many seconds it takes to complete a
write on the disk. Although the numbers may look insignificant, remember that on a high throughput
production system, there could easily be hundreds of thousands or even millions of writes per second
where these numbers could mean the difference between low and high response times.
The average disk seconds per write on the physical was 0.0035 seconds OR 3.5ms.
On the virtual machine this was 4.7ms and on the over-committed virtual machine it was 5.3ms.
Figure 5-13
Project Report Document
5.2 Summary
In summary, the analysis shows that the Physical machine performs the fastest in terms of
transaction response times, throughput and component metrics as compared to the two virtual
machines.
We see that the INNERJOIN, SELECT and INSERT transactions on the virtual machine take
an average of 4.6%, 6% and 5.7% longer to complete respectively. Throughput is also 1.5%
less on the virtual machine.
The over-committed virtual machine takes 33.8%, 50.6% and 46% longer to complete for the
same transactions and executes approximately 10% less throughput.
Lastly, analysis shows that the component metric with the most impact on performance is the
seconds per disk write as shown in Figure 5-13. Briefly, the virtual machine took an average
of 34% longer to perform a disk write and the over-committed virtual machine took an
average of 51% longer to complete a disk write as compared to the physical machine.
The numbers are in the milliseconds however on a system with a high number of writes (and
potentially reads as I suspect a similar performance degradation would apply) this could be
the cause of a disk bottleneck and cause end-users poor performance.
Project Report Document
5.3 Web Test Results
The first set of results are percentiles of each individual web transaction response time. E.g.,
90% of users will see a transaction response time of x milliseconds for the GET HTTP
transaction.
Figure 5-14 shows us the response times for the GET transaction. We can see that the Physical
environment outperforms the two virtual environments as expected and the virtual outperforms
the over-committed virtual which is also expected.
Across the percentile figures, the Virtual machine is 210% slower at completing the transaction
than the Physical and the over-committed virtual machine is 596% slower than the Physical.
Figure 5-14
Project Report Document
Figure 5-15 shows very distinctively the differences in performance of the three environments
Another interesting pattern to note here is that the transaction response time is quite steady on
the Physical environment whereas the response times on the two virtual machines are not only
higher but also more erratic. This could mean that on a web server with more dynamic and more
content-full web pages, users could experience a very varied response time figure.
Figure 5-15
Project Report Document
Figure 5-16 shows us the response times for the POST transaction. Again, the Physical
outperforms the two Virtual environments however the performance of the two Virtual
environments are almost similar and crossover on the 99th percentile statistic.
My calculations show that across the percentiles, the Virtual environment was an average of
33.7% slower than the Physical environment and the over-committed Virtual machine was
36.3% slower than the Physical.
Figure 5-16
Figure 5-17 shows the response time graph of the POST request. The differences in response time are
not as substantial as they are for the GET request however you can still see the physical machine
outperforms the two virtual machines.
The erratic pattern also exists here for the two virtual machines.
Project Report Document
Figure 5-17
Figure 5-18 and 5-19 show the transactions per minute of each Web transaction (GET and
POST). As the response times in the graphs above indicate that Physical performs better,
virtual slightly worse and over-committed virtual significantly worse, we would expect to see
this reflected in the transaction throughput. i.e., if the transaction is quick to complete then we
should see more of them. If the transaction is slow then we should see less of them due to
longer processing time.
This is exactly what we see here where the physical machine exceeds in transaction
throughput over the virtual machine and significantly exceeds throughput as compared to the
over-committed virtual machine.
Average calculations show that the virtual machine performs 2.1% less transaction throughput
per minute and the over-committed virtual machine performs 4% less transaction throughput
per minute.
Project Report Document
Figure 5-18
Figure 5-19
Project Report Document
Figure 5-20 indicates that CPU utilisation profile is unlike what we saw for the database tests
(Figure 5-10) where the CPU utilisation was quite static, all within a 2% range of each other.
Here you can immediately see the over-committed VM is not only consuming more CPU
(approximately 40-50% more – so about half a CPU core) but it also has an erratic profile.
Figure 5-20
The only explanation I could come up with to explain this profile was the CPU Stolen time
which was recorded in Windows PerfMon via the ESX hypervisor’s APIs.
CPU Stolen time tells us the time the VM was able to run but not scheduled to run (potentially
because of the over-contention of physical resources due to the Ubuntu VM)
We can see the CPU Stolen Ms in Figure 5-21 which shows us an erratic profile during the
three tests on the over-committed virtual machine. Although the wait times may seem
insignificant, any delay is a direct hit on the performance of the virtual machine.
Project Report Document
Figure 5-21
The data from Figure 5-22 was taken from vSphere and is similar to the one I discussed in
Figure 5-11.
This data is collected directly from the hypervisor and shows us the CPU utilisation (out of
100%) for each object.
As per the table in the Method section (Scenario 3), we know that the hardware allocation was
as follows:
Over-committed VM = 2 CPUs, 8 Cores, 8GB RAM
Ubuntu Desktop = 1 CPU, 4 Cores, 4GB RAM
The important note to take from this graph is the average CPU utilisation of the Ubuntu
virtual machine as it tells us how much contention was present during the test.
The figure is 50% utilised which equates to 25% over-contention on the test Windows Server
VM.
Project Report Document
Figure 5-22
There is nothing unusual to report for the web server working set.
Figure 5-23
Project Report Document
The disk activity in Figure 5-24 on the Web testing is similar to the profile we saw in the Database test
analysis.
The average disk seconds per write on the physical was 0.0068 seconds OR 6.8ms.
On the virtual machine this was 7.9ms and on the over-committed virtual machine it was 9.6ms.
Figure 5-24
5.4 Summary
In summary, the performance of the Physical machine outperformed the performance of the
two virtual machines; a result similar to that of the Database server testing (Summary 5.2).
We see that the GET and POST transactions on the virtual machine take an average of 210%
and 33.7% longer to complete respectively. Throughput is also 2.1% less on the virtual
machine.
The over-committed virtual machine takes 596% and 36.3% longer to complete for the same
transactions and executes approximately 4% less throughput.
Component performance analysis also shows that the over-committed virtual machine CPU
utilisation is about half a core higher as well as a lot more erratic than the other two machines
(Figure 5-20).
The last component analysis is with regards to the seconds per disk write. The results here are
similar to those in the Database testing (Summary 5.2). The virtual machine took an average
of 16% longer to do complete a disk write and the over-committed virtual machine took an
average of 41% longer to complete a disk write as compared to the physical machine.
Project Report Document
6 Conclusions and Discussions
6.1 Introduction
In this chapter I will tie up and discuss the main project themes.
6.2 Project Objectives and Research Questions
Pages 5 and 6 in this paper outline the research question and objectives.
Briefly, the objectives were to run performance tests against a web and SQL server with
performance monitoring enabled and analyse the test results and performance log data.
I was to use the output of this analysis to answer the research question which was to find out
which environment was better for performance for the two workloads (the web application
and SQL server).
All the objectives were successfully met and the research question answered which I will
discuss in section 6.3
6.3 General Conclusions
Here I will discuss my main findings and conclusions of the research project.
My main finding is that I found the performance of the Physical machine far superior in every
test run as compared to the two virtual machines.
The virtual machine which had no levels of over-commitment performed only slightly worse
than the Physical machine and I found that the over-committed virtual machine performed the
worst in every test.
This answers the research question as I can definitively say that the Physical machine was better
for performance for the two tested workloads.
However I cannot give a definitive number or percentage as to how much worse the virtual
machines perform as the results varied in the two scenarios tested (SQL and web application).
Project Report Document
This is to be expected as they are two different workloads and the hypervisor may work better
with one type of load than another.
What I can say is that generally speaking, the overhead of virtualization management is
marginal (a few percent) and the overhead of an over-committed virtual machine can be
significantly more dependent on the level of over-commitment. i.e., the higher the over-
commitment, the higher the contention will be which will mean greater performance
degradation and sporadic activity.
Of course, it all completely depends on the system configuration, hypervisor configuration,
levels of over-commitment and how modern the hardware is (more modern CPU architectures
support more virtualization instructions which mean less hypervisor overhead).
It is also important to note that over-commitment is not inherently a bad thing. Over-
commitment, if done well can be an excellent use of capacity and the only way it can be done
well is if the workloads and systems are understood properly. For example, if I know that
System A will consume x resources at a certain time of the day, there is no problem over-
committing resource x to System B at other times of the day. Unfortunately the problem is that
in a public cloud environment (such as Amazon Web Services) it is almost impossible to
characterize the systems and workloads making it very difficult to over-commit; thus causing
contention of the resources.
My final point and recommendation is in line with the authors of the Frontiers of High
Performance Computing and Networking (Min, Di Martino, Yang, Guo, & Ruenger, 2006)
book where they say:
“[…] virtualization is currently not used in high-performance computing (HPC) environments.
One reason for this is the perception that the remaining overhead that VMMs introduce is
unacceptable for performance-critical applications and systems” (page 475)
I strongly believe that at this moment in time, virtualization is not the way forward for high
performance computing which depends on millisecond responses.
For all other non-HPC environments, virtualization should be considered as a technology as it
has many potential benefits to an organizations IT.
Project Report Document
6.4 Implications for Future Work
I believe work in this area can be improved by putting in more time in to testing different
types of workload scenarios, using more modern hardware and attempting to try out different
levels of over-commitment on test virtual machines. Also another interesting test would be to
see how different hypervisors perform to see if hypervisor x performs workload y more
efficiently than hypervisor z.
6.5 Project Management
In this section, I will discuss my own progress in the project including management and
control, what I have learnt and would I could have done differently.
Management and control of the project was done via keeping up with deadlines that my
supervisor and I had set and agreed upon. If deadlines could not be met then they were
extended along with a reason why and usually followed up with a brief meeting with my
supervisor.
During the project I learnt many things but one of the most important to me was the project
management aspect; making sure I could meet my deadlines and if not, to keep everyone
involved (my supervisor in this case) updated so that they can provide some useful advice or
suggestions.
I learnt that it is important to start projects as early as possible as not everything goes to plan
and the more time you have, the less pressure there is to solve any issues you may come
across. If I had my time to do this project again, this is one area I would have definitely taken
more seriously as this was a real issue during my project where tests were re-run many times
and test tools were changed due to inconsistencies in the results.
Project Report Document
7 Glossary
Term Definition
Over-commitment Over-commitment is a virtualisation term for the over-
commitment of physical resources to more than one
virtual machine.
For example, for a host which has a total capacity of 8GB
RAM, you can give 8GB RAM to two virtual machines.
The host’s memory would be over-committed in this case.
Contention Usually when there is over-commitment on physical
hardware, there is contention.
This means that two or more virtual machines are
contending for the same physical resource.
High performance computing or HPC Typically HPCs are used to solve advanced computation
problems and depend on near real time response times
and stability – something which cannot be guaranteed by
virtual machines at this time.
Example applications which can be classified as HPC
uses are as follows:
Data mining
Simulations
Modelling
Visualisation of complex data
Rapid mathematical calculations
Performance monitor or PerfMon This is a built-in Windows tool to define and collect
performance metrics
Project Report Document
8 Appendices
A. Please see the Project Definition Document titled Volume 1 – Project Definition Document which will be attached after these Appendices
B. Windows Azure and Amazon (AWS) graphical representations of launching instances
Project Report Document
C. Test Plan
Please see the Test Plan document titled Volume 2 – Test Plan which will be attached after these Appendices
D. Research results Excel files
Please see the Excel documents provided on CD as part of the research submission
E. Prime Number ASP.NET framework version 4 application
Default.aspx
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default"
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div>
Enter number of prime numbers needed: <asp:TextBox id="td1" runat="server" /><br />
<asp:Button ID="b1" OnClick="submitEvent" Text="Submit" runat="server" /><br /></br />
<asp:ListBox ID="lb1" runat="server" AutoPostBack="true"></asp:ListBox>
</div>
</form>
</body>
</html>
Default.aspx.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
td1.Text = "10";
}
}
protected void submitEvent(object sender, EventArgs e)
{
int needed = 10;
if (!string.IsNullOrEmpty(td1.Text))
{
Int32.TryParse(td1.Text, out needed);
}
lb1.Items.Clear(); //Remove all items
foreach (int a in generatePrimeNumbers(needed))
{
lb1.Items.Add(a.ToString());
Project Report Document
}
}
private List<int> generatePrimeNumbers(int total = 10)
{
int rand = new Random().Next();
int count = 0;
List<int> primeNumbers = new List<int>();
while (count != total)
{
if (isPrime(rand))
{
primeNumbers.Add(rand);
count++;
}
rand++;
}
return primeNumbers;
}
private bool isPrime(int val)
{
if ((val & 1) == 0)
{
if (val == 2)
{
return true;
}
else
{
return false;
}
}
for (int i = 3; (i * i) <= val; i += 2)
{
if ((val % i) == 0)
{
return false;
}
}
return val != 1;
}}
Web.Config
<?xml version="1.0"?>
<!--
For more information on how to configure your ASP.NET application, please visit
http://go.microsoft.com/fwlink/?LinkId=169433
-->
<configuration>
<system.web>
<compilation debug="true" targetFramework="4.0"/>
</system.web>
</configuration>