LargeDataSets DBMSTesting October

From Wiki
Jump to: navigation, search

Archive of work in DBMS Testing with October Group -- Carlos, Jesus, and Manuel

Comments on final reflections, from Dr Thomas

Sorry, Manuel, if I'd known you were adding ./ to all commands, I could have shown you another way. Ubuntu can be told to "look" for programs in your current directory (./).

Jesus / Carlos, thanks for copying in all the detailed instructions for installing and running things. I'm pretty sure the next group will thank you, too.

I might comment that you can exceed the operating system's capacity and make it crash even if your computer is a lot bigger than a netbook. It just takes more work on your part. And, usually, the cooperation of whoever holds root/superuser privileges. (On your netbooks, that's you.)

I hope you had fun this month.


Goal for this week: Finish recording test results for all to see. Write up (here) final reflections on what you learned this month (a couple paragraphs), plus all and any information you'd need to restart this work in Winter. You will forget more than you think you will -- all the tips and tricks for getting it to work that are in your head now need to be in the wiki.

Final reflections and notes for Manuel

I had the misfortune of "forgetting" my mysql password. I had to uninstall and reinstall mysql on Ubuntu and wasn't able to get through all the way installing it on Windows 7. Just reinstalling mysql turned out be an undertaking as you had to use the synaptic manager to fully delete mysql and then get prompted for a new password when reinstalling. I came into this portion of the project thinking the installation of the actual benchmarking tool was not going to be such an issue only to be sadly mistaken. Installation took much more than anticipated and if not patient, can be quite a head ache. Throughout this portion of the research, I came more in touch with installing and setting up things using the command prompt as well as refreshing up on sql commands. Things to remember about this process are:

1. read what other groups have done

2. not everything in readme file works. Look for small things that make the stuff in readme file work as intended

3. adding the "./" to the beginning of the commands given on the readme file

4. adding the suffix to the end of all files referenced in commands

5. changing the paths for java home in all ".sh" files

6. if using mysql, make sure to download the java connector and edit all ".sh" files to show it

7. don't forget to run the load data with the same number of warehouses you intend to test with before the run

8. use the drop table command after each run and

9. make sure you have enough power to finish the test,look at previous times to estimate how much battery you need for a given amount of warehouses.

Overall it was too bad I couldn't get the benchmark installed on Windows 7 to have yet another factor to be able to compare with. It was fun trying to "break" and find the limits of mysql. From the few tests we did, we noticed some difference in time to load when running on battery on plugged in. I forgot to drop all my data once in it was loaded so the tests I ran afterward might have been taking longer due to not "deleting" the data after each run.The next group should be able to do more of this "fun" stuff given all the info already on here. Thanks to the earlier group for their posts and help, it was very useful.

Final reflections and notes for Carlos

The main problem with this project is to install the benchmark software, it requires some patience and quite a bit of time. Free benchmarking tools are out there, but the possibility for one of them to be plug and play (at least for ubuntu) may be a little slim. I really recommend to look at the readme file, follow its instructions and try to make sense off of it. This benchmark installation showed me how to install "real, or complicated to install" software on which the user has to configure files, download "patches" to make an instruction work, follow instructions and the fact that groupwork is the key on making any project a reality. On the other hand, I was able to modify the readme file (with the help of Juan) in order to accomodate the necessary instructions to get this project going.

Final reflections and notes for Jesus

Dear Future Jesus (or who ever is reading this, thinking it will help you get started) (First of all, how are you? Good. Anyway.. )

1. download connector, search for the jar file and put in the BenchMark/lib directory.

2. <replace info, ok>: driver: com.mysql.jdbc.Driver conn:jdbc:mysql://<localhost>/<database> user: password:

3. If the java is installed correctly and the .sh script is still complaining, add "/usr" after the $JAVA_HOME in each of the .sh scripts.

    • Also, add ../lib/<mysql connector name jar>: to the list of libs in each of .sh files.

README file, very useful.

  • (add a space after numWarehouse =#.)

Well, that's it, I guess.

What have I learned? When all else fails, reinstall ubuntu. Seriously.

Instructions for building

Use of JDK 1.5 is recommended, build with "ant jar" from the command line of the base directory or use your favorite IDE such as Netbeans or Eclipse.

Instructions for running

The below scripts all use relative paths, but, they depend on JAVA_HOME environment varibale being set so that the correct runtime can be found.

JDBC drivers and sample "?.properties" files are included to make it extremely easy for you to test out the performance of EnterpriseDB, PostgreSQL, MySQL, Oracle, & SQL Svr in your environment.

(Created by copying and changing them to mysql)

1. Go to the 'run/scripts' directory, edit the appropriate "??????.properties" file to point to the database instance you'd like to test. Of course you'll substitute in the name of your appropriate config file in the command lines below.

(find java home in order to place the location at the first line of the

2. Run the "sqlTableCreates" to create the base tables.

(run this command to see if everything works)

(also done before command worked:

  • Downloaded connector/j from mysql website
  • Extracted connector/j.jar file and placed on the lib folder
  • Then, the name of the .jar file was added on the by adding :../lib/filename in the run folder


Then the following command was ran, and it worked with mySQL

- ./ sqlTableCreates

Note: "sqlTableCreates" will truncate all the tables so you can start over clean.

There is also a "sqlTableDrops" script if you need it.

3. Run the "loadData" command file to load all of the default data for a benchmark:

A.) Approximately half a million rows in total will be loaded across 9 tables per Warehouse. (The default is numWarehouses=1) A decent test size of data totaling about 1 GB is 10 warehouses as follows:

$ ./ numWarehouses= 1

( file was changed to have Java home and the connector/j file on it, like the

  • I Did not use approach B*

B.) Alternatively, you may choose to generate test out to CSV files that can be bulk loaded as follows:

$ ./ numWarehouses=1 fileLocation=c:/temp/

These CSV files can be bulk loaded into EDB-Postgres via the following:

$ runSQL sqlTableCopies

You may clean out the data in the tables without dropping them via:

$ runSQL sqlTableTruncates

4. Run the "runSQL" command file to execute the SQL script "sqlIndexCreates" to create the primary keys & other indexes on the tables.

- ./ sqlIndexCreates

5. Run the "runBenchmark" command file to execute the swing GUI application to test the database. Don't forget to set the number of warehouses equal to the number you created in step 3

- ./

Another important observation is that over time the performance of processor decreases. I noticed this from the runs with only one warehouse in comparison to seven or more warehouses. Also I found out that the processor is more efficient when is the netbook is plugged in. You can tell by looking at how long time it takes to finish a complete run. For instance there are about 14 seconds difference (run with 3 warehouses) between the netbook plugged in and not (this without the computer going to sleep). Also, after several tries of running the software (with dropping tables) the system crashes. This happened to me twice.

Overall, I really enjoyed this project. It is nice to know you can exceed the Operating system resources and make the operating system crash by executing legal operations. However, the systems we are using are just netbooks.


Goal for this week: Run as many tests of MySQL / Ubuntu / PostgresQL / Windows as we can!

Using BenchmarkSQL. If we can get another benchmarking application going, too, that would be great.

Tests to run:

How long does it take to load 1, 2, 3, 4.... ?? warehouses into MySQL? Measure in minutes/seconds. Measure for the netbooks until something dire happens. Measure on Jesus's virtual Ubuntu-on-a-Mac, but stop short of doing anything dire to the Mac. Remember to drop all tables before starting a new

Loading the database tests:

10/2010 14:42:10
Whose Netbook?Who ran the test?How many warehouses?Wall or battery power?Run timeStart timeEnd timerows loadedrows per secondComments
Ubuntu/Asus IEEECarlos3 (MySQL Ubuntu)battery418 seconds10/15/10, 15:15:5110/15/10 15:36:461,597.694 rows1273 rows/secForgot to drop tables, this may be the reason why it took so few time
Ubuntu/Asus IEEECarlos3 (MySQL Ubuntu)battery1,137 seconds10/20/10, 18:33:0410/15/10 18:52:021,598,240 rows1405 rows/secThis time tables were dropped and time is average
Ubuntu/Asus IEEECarlos3 (MySQL Ubuntu)wall1,123 seconds10/20/10 13:39:24 201010/20/10 13:58:081,596,404 Rows1421 rows/Sec
Ubuntu/Asus IEEECarlos7 (MySQL Ubuntu)wall2,555 seconds10/20/10 13:59:35 10/20/10 14:42:103,594,128 Rows1406 Rows/Sec
Ubuntu/Asus IEEECarlos7 (MySQL Ubuntu)battery2,567 seconds10/20/10 18:55:57 10/20/10 19:38:443,594,433 Rows1400 Rows/SecOnce again, running on battery was slower and rows/second decreased over time.
Ubuntu/Asus IEEECarlos20 (MySQL Ubuntu)battery7,228 seconds10/21/10 15:30:32 10/21/10 17:31:0110,078,571 Rows1394 Rows/SecI believe the battery life has something to do with the perfomance.
Ubuntu/Asus IEEECarlos100 (MySQL Ubuntu)wall35,451 seconds10/21/10 18:43:11 10/22/10 04:34:0249,995,076 Rows1410 Rows/SecTo test previous my statement, ran 100 warehouses and found out that the decrease in performance is only when running on battery (Can not run this test using battery power).
Ubuntu/Asus IEEEJesus5wall419 seconds10/24/10, 17:51:5210/24/10 17:58:52599099 rows1429 rows/sec
Ubuntu/Asus IEEEJesus10wall491 seconds10/24/10, 18:01:0010/24/10 18:09:12598329 rows1219 rows/sec
Ubuntu/Asus IEEEJesus15wall416 seconds10/24/10, 18:14:3410/24/10 18:21:30599024 rows1219 rows/sec
Ubuntu/Asus IEEEJesus20wall414 seconds10/24/10, 18:26:1110/24/10 18:33:05599025 rows1446 rows/sec
Ubuntu/Asus IEEEJesus25wall418 seconds10/24/10, 18:34:5710/24/10 18:41:55599651 rows1434 rows/sec
Ubuntu/Asus IEEEJesus30wall413 seconds10/24/10, 18:48:1010/24/10 18:55:04598677 rows1449 rows/sec
Ubuntu/Asus IEEEJesus5battery418 seconds10/21/10, 17:23:2010/21/10 17:30:18599036 rows1433 rows/sec
Ubuntu/Asus IEEEJesus10battery3624 seconds10/21/10, 10:35:5810/21/10 11:36:235089366 rows1404 rows/sec
Ubuntu/Asus IEEEJesus15battery416 seconds10/21/10, 16:12:4510/21/10 16:19:42599789 rows1441 rows/sec
Ubuntu/Asus IEEEJesus20battery7209 seconds10/21/10, 11:48:0910/21/10 13:48:1810080749 rows1398 rows/sec
Ubuntu/Asus IEEEJesus25battery417 seconds10/21/10, 17:36:4310/21/10 17:43:40598002 rows1434 rows/sec
Ubuntu/Asus IEEEJesus30battery415 seconds10/21/10, 18:29:5210/21/10 18:36:48598251 rows1441 rows/sec

Running the TPC-C-like benchmarks:

Whose Netbook?OS usedDBMS usedWho ran the test?How many warehouses?Number of TerminalsWall or battery power?Total seconds to run teststart timeend timerows loadedrows per secondComments
Ubuntu/Asus IEEEObuntumySQLManuel11bat~50%410 sec10/15/10,15:35:3610/15/10,15:42:275988101460
Ubuntu/Asus IEEEObuntumySQLManuel21wall786 sec10/18/10,14:42:5510/18/10,14:56:0210975271396


virtualBox -link to software for running virtual OSes. FREE. by -Jesus


"Pathologies of Big Data", Communications of the ACM, August 2009. Read this by next week. You might need to be on a campus computer to access the full text of the article.

Transaction Processing Council -- we are going to try to get one of their benchmarks running.

Report on a TPC-H test someone ran on MonetDB. Included here to give you an idea of the kind of report we might want to generate. Their results are, now, old, so our results would be fresher.

PostgresQL - another free DBMS we could compare to MySQL.

Oracle 10g Express DBMS -- a free, starter DBMS product from the giant DBMS company, Oracle

BenchmarkSQL open source software for running TPC-C on a variety of DBMS products.

List of database benchmarking tools, plus DBMonster. On the list, the "Open Source Development Labs Database Test Suite", BenchmarkSQL, Bristlecone, Java TPC-W, TPCC-UVa and Hammerora look most promising. Sourceforge seems to have more free DBMS benchmarking tools, too.

Try to get a TPC benchmark running against MySQL on your netbooks. Any TPC benchmark will do.

Keep track of:

  1. Useful commands to the computer. Ex. "mysql -h hostname -u username -ppassword", for starting mysql up at the command line in Ubuntu.
  2. Useful web sites, useful software
  3. Points where installing or using software got hard/confusing, and how you got around the problem (if you do) -- this tells us where software is not ready for "Joe Public" to use it, which is valuable information for improving the software.