The online community for software testing & quality assurance professionals
   
Active Topics Today's Topics
Sponsors:
Lost Password?

Home
BetaSoft
Jobs
Training
News
Links
Downloads

News Group:
software.testing


Testing
Automation
Performance
Engineering
Miscellaneous
Statistics
Poll
  QA Forums
  Performance & Load Testing
  Plausibility of users simulation

Post New Topic  Post A Reply
profile | register | preferences | faq | search

UBBFriend: Email This Page to Someone! next newest topic | next oldest topic
Author Topic:   Plausibility of users simulation
niteguy
New Member

Posts: 3
Registered: May 2002

posted 09-23-2002 11:43 AM     Click Here to See the Profile for niteguy   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by niteguy
Hi all,

I was trying to determine how plausibly automated tools simulate real users. I measured time for downloading webpage measured on the server.
Page size is 150 kB ( several images when biggest of them had 64 kB, plain HTML text).
For Rational Suite Performance Studio and 1 VU it took 440 ms. Using Internet Explorer 5.0 it was 312 ms.
Using Sirano STA the difference was even more substantial : STA - 141 ms, IE – 359 ms.
I realize that other parameters also are important for user simulations ( for example, Round Trip Time, users distribution in time etc.). If somebody had such kind of experience in estimating plausibly of users simulations by automated tools? May be the number of threads which tool uses for generating Get command is important parameter?
Some details of the experiment: Laboratory conditions ( low Round Trip Time),
Server - Win2000 server; web server IIS 5.0, Network bandwidth - 100 Mbit/s, time was measured using MS Network Monitor.

Thanks

------------------

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-23-2002 12:46 PM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
What a great topic! There are thousands of things that can contribute to those kinds of differences. Not the least of which are threading differences, and that (supposedly) miniscule difference in time between when all the information is downloaded and when your browser finishes presenting that information.

Every testing tool, every browswer, every platform, every extra application, etc, etc, etc will have an affect on what the ACTUAL user exeperience, and what the ACTUAL user model is.

So, you ask, if I have to account for all these things, how do I ever figure it all out before my application is simply obsolete? That is what makes us Engineers, not simply testers.

In your particular case.

1) can you tell me exactly what each of those numbers mean? I suspect they are not identical measurements. For instance. You cannot compare a Rational VU script timer directly to IE without knowing how long the CPU think times are in the script, and how long they take on the browser on that particular machine.

2) are these single shot measurements, or hundreds of measurements conducted over identical circumstances, with very low deviances and standard deviations? If you don't have a fairly significant sample size, these numbers are meaningless. The variance you are seeing could be caused by the guy in the next cube downloading an MP3.

3)you are comparing Network Monitor times to perf tool times? Have you really dug in to makes sure you are comparing apples to apples here? Have you set the Test Studio scripts to the proper timer extentions to match network monitor? (i.e. FS_LR or LS_LR etc?)

In short, yes it is possible to accurately predict end user experience using a load generation tool. If you want to predict in the +/- .25 second range, you will need to be VERY intimate with your tool and your environment and have an EXTREMELY large data sample size.

I'll stop for now. I hope my soap box rambling yeilds some useful tidbits of information.

Scott

------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

Ian
Advanced

Posts: 175
Registered: Sep 2001

posted 09-23-2002 03:31 PM     Click Here to See the Profile for Ian   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by Ian
"estimating plausibly of users simulations by automated tools?"

I use 'windump and 'ethereal' to compare the http traffic for 'content' (it is possible that some files, in particular images are being chached etc.) and 'transactions' (number of tcp connections and requests per connection) some tools will not use either 'persistent' connections or 'pipelining'. When the http traffic is the same (content and transactions) as the browser(s) you are interested in then the concern moves to the efficiency (Requests per Second) of the load generation tool. The bottom line 'how close do you need the simulation' to verify the requirements. I specialize in large ERP systems and the bottle necks are always with the transactional 'posts' which are never pipelined or attached to an existing persistent connection. So I can use the appropriate tool without paying for features I do not need.

Regards

Ian

------------------

IP Logged

niteguy
New Member

Posts: 3
Registered: May 2002

posted 09-23-2002 06:02 PM     Click Here to See the Profile for niteguy   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by niteguy
Scott, thank you for such comprehensive and interesting answer.

I realize that in order to understand the situation properly more details are necessary.
I tried to make these measurements as clear as possible – I mean to eliminate disturbing factors. Following the numerations in your answer:
1. I removed all think_times from Robot and STA scripts. Reasons for this are: a)there is only one web page for downloading in this measurements b) I can not introduce delays between downloading the images in IE in order to make them identical to Robot script.
I guess this approach can be acceptable (please correct me if it is not true).

2. There were several measurements for every situation, but not hundreds. I found differences in measured times, they are stable. But data is not sufficient for calculating standard deviations in order to do serious numerical conclusions. Of course I tried to eliminate variances in measurements, I was sure that there was no big load on the network except from these measurements. Estimating load to the network, it is: 150kB/ 140ms = 1 Mbytes/sec = 8 Mbit/sec is substantially less than threshold for this network.
3. All time data presented is from Network Monitor. The accuracy of data presented in this tool is better than in Robot or STA ( probably hundreds or dozens microseconds compared to 10 ms in Robot or STA). I filtered all HTTP and TCP frames passed through NM and calculated differences between first and last one.


What about necessary accuracy for measuring response time ( Ian also mentioned this in his comments), +/- 0.25 sec probably is acceptable from user point of view. But for example we are doing load testing. If thinking_time value for the interaction of definite tool with HTTP is pretty high, it will keep web server busy for maintaining connections with clients, but the load for processor will not be high as it is suppose to be in modeling situation when we use many VU. Or it can be just opposite situation because keeping connection demands processor and memory resources and can distort the situation.

Sincerely,
Niteguy

------------------

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-23-2002 08:58 PM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
I'll try to ramble less this time.

1) That is not exactly true. Most (for the moment, I'll leave it at that) of the think_times in Robot (I haven't used STA, so I won't assume) are actually there to simulate the recording browser better, so taking them out will actually decrease your accuracy. What they represent is the time the browser actually spends processing the image. So comparing no think_time user response times to browser actual response times isn't "apples to apples"

2) Several can certainly be enough to find patterns. I'll accept the measurements as valid and move on - was more of a tangent anyway. One of the first things I would do is run an identical test with various file sizes. See if the delta is always the same (i.e. Robot maybe reports .4 seconds faster for a 10 kB, 150 kB and 1 Meg file.) Then you have found the descrepancy

3) I do not question the accuracy of Network Monitor. My question was, can you be certain that Network Monitor is measuring starting and ending at the same point as the Load Generation tools? Robot has sseveral combinations of ways to measure response times. If you spend some time testing out all of the various combinations, I promise you that you will find more than a .15 second variance between combinations. Again, I have no idea about STA

I am going to add another topic here that I neglected initially.

4) Threading. You may have noticed that Robot plays back a single VU as a single thread. Your real user is multi-threaded. A single user played back in Robot will ALWAYS be a little slower than a real user with a real browser under otherwise identical circumstances. This issue corrects itself at between 4 and 10 overlapping (not necessarily concurrent) users. Most load generation and reporting tools are the same way. It's something we just deal with in the same way that we deal with the fact that some of our users might just be out there using Opra and have intentially set their browser to be single threaded.

Robot (and LoadRunner, and Silk - just to be fair to the ones I am most familiar with) are consistent, and they are REALLY close. My predictions for real users in real load situations on real production environments based on tests executed an analyzed using these tools have been extremely good. Personally - I'll take a +/- 150 millisecond margin of error. I can't make a user community model 100% accurate and inaccuracies in my model are going to have a significantly bigger margin of error than a consistent 150 ms.


------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

qa_tester
Guru

Posts: 363
Registered: Aug 2001

posted 09-24-2002 07:58 AM     Click Here to See the Profile for qa_tester   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by qa_tester
Thanks Scott for this valuable information, I just have two questions,

1) as I know all these tools attempt to simulate multi-users under the same hardware/software environments (all the VU using the same machine and the same environments) but in real life scenarios most of the users using variety of environments, some of the have faster environment than the others..that bring us to the question how we can really simulate real live Vusers?

2) in real life users usually perform different actions at the same time, for example, user A login and purchase an order and logout, whereas user B login and cancel an order and logout, and user C login and modify his profile and logout..how we can simulate this scenarios simultaneously

------------------

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-24-2002 08:42 AM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
Good questions!

1) The simple answer is "you don't". Now, as you might imagine, the answer really isn't that simple. Remember there are two parts to your model. What the server sees and what the user experiences. The server doesn't care what the user's connection rate is, or how fast their browser processes the pages. All it knows is the time between requests. When you model your scripts, you model for distributions of these think_times and simulated connection rates based on your expected user community. (more or less easy depending on your tool) If you aren't exactly sure, model a best, expected and worst case scenario and compare them.

As for the implied second part of that answer - as far as predicting user experience, the best we can do is predict how long it will take for all of the requests to complete across a certain connection rate and then state our assumption about clean phone lines, minimum hardware, etc, etc.

2)I'm going to cop out on you here. Read articles 2,3 and 4 on www.perftestplus.com In short, model it, then code it. Don't be crippled by the "easy" features of the tool you are using. Add conditional logic, looping, model abandonment, etc. If you can model what 90% of your users do 80% of the time - with all the randomness of real users, you can very accurately predict performance for over 95% of what will actually happen on your site.

------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

Ian
Advanced

Posts: 175
Registered: Sep 2001

posted 09-24-2002 09:12 AM     Click Here to See the Profile for Ian   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by Ian
Be careful with the 'think_time' in STA. STA will estimate a WAIT time when it expects the page to be retrieved. That is STA will use 2 threads per user (as for http1.1 browser) but for each request STA will estimate the idle time so as not to waste cpu ticks by trying to process a blocked thread. If you set WAIT to zero (in STA) you could be wasting cpu clicks when you run a test of any load. When I want load that complies to true http1.1 then STA is my tool of choice as it makes the 'connection' object independent from the 'request' and the user can open and close the connections and pipeline, but be aware that there is no 'think_time' as such it is bundled with WAIT and this has other implications.

Regards

Ian

------------------

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-24-2002 10:06 AM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
That is great info Ian!

I haven't used STA at all - how does it compare in other areas? Price? Scripting language?

It sounds like somthing I'd like to "add to my arsonal" if it isn't too costly.

------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

Ian
Advanced

Posts: 175
Registered: Sep 2001

posted 09-24-2002 12:02 PM     Click Here to See the Profile for Ian   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by Ian
Scott,

The Good news is OpenSTA is FREE and Open Source.

The Bad news is OpenSTA is FREE and Open Source.

There is an OpenSTA forum on this site and http://www.opensta.org/ has all the details.

I have used it off and on for about 2 years and it is the best FREE http load testing tool I have found. It has it's own scripting language and there is a 'learning curve' for effective use. I have not found it as 'robust' as other tools, I have had 'hanging threads' etc., although the tool has improved with later versions and the website is 'active'. I do not use it much now, I have switched to Microsoft's Application Test Center (which does not offer the same flexibility for http1.1 emulation) but is VERY robust, cheap and meets my needs.

Regards

Ian

------------------

IP Logged

niteguy
New Member

Posts: 3
Registered: May 2002

posted 09-24-2002 10:44 PM     Click Here to See the Profile for niteguy   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by niteguy
Hi all,
1. Measuring differences in processing time for different tools varying page size is interesting idea and I will try to do it.
2. I measured times for page processing only on the server using Network Monitor in order to avoid discrepancy in measurement results using different tools
3. Threading. Yes I noticed that Robot has only one thread scrutinizing data from Network Monitor. IE creates several threads (as Get command and responses sometimes overlapped). By the way, Ian wrote that STA has 2 threads. How to get this information, from source code or using another way (for example investigating data traffic)?
4. Simulating statistical behavior of user community of course is very important (as described in the articles 1, 2, 3 by Scott). But my point here is to estimate differences when simulating load using different tools. For example, web page contains many small images, think_time of tool is longer than in IE and is longer compared to data transfer time ( it is true approach). Also the tool has only one thread. Using Internet Explorer download time can be substantially less than using the tool even for HTTP 1.1 with keep connection alive. Especially when client and server are not far from each other (Round Trip time is small), both of them are pretty busy ( thinking time may increases not equally for IE and the tool) and network not overloaded (Round Trip time is small). I chose the worst situation. In such approach difference between IE and Robot can be very substantial not only in Response Time but also in putting load on the server using the same number of users (virtual and real).
5. qa_tester writes that it is difficult to simulate different user behavior and environment. Statistical user distribution by activity and other parameters is usual approach. But who tells that we can not simulate the situation when most users make purchase or just surfing the page. It is less probable behavior but not less important from load point of view in extreme situations. It happens. And results can be predicted more reliably. Dividing complex problem on small more understandable parts is normal approach and gives good understanding of the problem.
Regards,
Niteguy

------------------

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-25-2002 06:13 AM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
Ian,

Thanks, I'll check it out. Free tools are extremely useful for all kinds of things. I have Rational licenses, but as a consultant, there are rules about when we can and can't use them in pre-sales situations. This could just be (at least part) of the answer.

------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

RSBarber
Moderator

Posts: 852
Registered: Jul 2002

posted 09-25-2002 06:29 AM     Click Here to See the Profile for RSBarber   Edit/Delete Message Copy This Message   Reply w/Quote Search for more posts by RSBarber Visit RSBarber's Homepage!
Wow, this discussion is getting dangerously close to being too deep to handle during a coffee break!

1) It really is kind of fascinating, and more useful in learning ranges for comparison than as a part of every test, if that makes sense.

2)If you measured only server processing time with Network Monitor, then Robot should show a longer time as it will include the the packet travel time, and likely either en-coding or de-coding on the client side (depending on the timer flags). That information leads me to believe that Rational was actually measuring really accurately.

3) Having said that about Robot, it is possible to manually handle the socket connections (threads), and turn your script into little functions in such a way as to make it multi-threaded. I don't know that I really think it's a valuable thing to do, but in the same breath I have to admit that I have done it.

4) Yes, these are important considerations - and yes I spend a long time struggling with those same topics - and yes I found every tool to be a little different, particularly in extreme cases. HOWEVER, when I started playing back realistically modeled loads, and surfing the site (using Mozilla, which shows you the presentation time for each page by default) and jotting down the page load times for ever page and comparing them with the results from the tool I fould that most tools are collecting measurements that are more accurate than any normal human can perceive. That is the same approach I used to figure out client side processing time delays to add to my script rather than accepting the Robot defaults. If you want a perfect measurement for a single action under extreme circumstances, a load generation tool is probably not the way to go.
5) Note - I never said to stop at modeling expected user behavior. When article 10 gets published you will see how I handle exactly those points brought up by qa_tester. Stay tuned.

(sorry, got to give you guys some bait to keep reading my articles - otherwise I'd have no reason to keep writing them!)

------------------
Scott Barber
NOBLE(STAR
Sr. Performance Engineer
sbarber@noblestar.com
http://www.noblestar.com
http://www.perftestplus.com

IP Logged

All times are PT (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic | Top
Post New Topic  Post A Reply
Hop to:

Contact Us | BetaSoft Inc. | Privacy Statement

Copyright © 1997-2003 BetaSoft Inc.


Ultimate Bulletin Board 5.45c