LargeDataSets DBMSTesting October
Archive of work in DBMS Testing with October Group -- Carlos, Jesus, and Manuel
Comments on final reflections, from Dr Thomas
Sorry, Manuel, if I'd known you were adding ./ to all commands, I could have shown you another way. Ubuntu can be told to "look" for programs in your current directory (./).
Jesus / Carlos, thanks for copying in all the detailed instructions for installing and running things. I'm pretty sure the next group will thank you, too.
I might comment that you can exceed the operating system's capacity and make it crash even if your computer is a lot bigger than a netbook. It just takes more work on your part. And, usually, the cooperation of whoever holds root/superuser privileges. (On your netbooks, that's you.)
I hope you had fun this month.
Goal for this week: Finish recording test results for all to see. Write up (here) final reflections on what you learned this month (a couple paragraphs), plus all and any information you'd need to restart this work in Winter. You will forget more than you think you will -- all the tips and tricks for getting it to work that are in your head now need to be in the wiki.
Final reflections and notes for Manuel
I had the misfortune of "forgetting" my mysql password. I had to uninstall and reinstall mysql on Ubuntu and wasn't able to get through all the way installing it on Windows 7. Just reinstalling mysql turned out be an undertaking as you had to use the synaptic manager to fully delete mysql and then get prompted for a new password when reinstalling. I came into this portion of the project thinking the installation of the actual benchmarking tool was not going to be such an issue only to be sadly mistaken. Installation took much more than anticipated and if not patient, can be quite a head ache. Throughout this portion of the research, I came more in touch with installing and setting up things using the command prompt as well as refreshing up on sql commands. Things to remember about this process are:
1. read what other groups have done
2. not everything in readme file works. Look for small things that make the stuff in readme file work as intended
3. adding the "./" to the beginning of the commands given on the readme file
4. adding the suffix to the end of all files referenced in commands
5. changing the paths for java home in all ".sh" files
6. if using mysql, make sure to download the java connector and edit all ".sh" files to show it
7. don't forget to run the load data with the same number of warehouses you intend to test with before the run
8. use the drop table command after each run and
9. make sure you have enough power to finish the test,look at previous times to estimate how much battery you need for a given amount of warehouses.
Overall it was too bad I couldn't get the benchmark installed on Windows 7 to have yet another factor to be able to compare with. It was fun trying to "break" and find the limits of mysql. From the few tests we did, we noticed some difference in time to load when running on battery on plugged in. I forgot to drop all my data once in it was loaded so the tests I ran afterward might have been taking longer due to not "deleting" the data after each run.The next group should be able to do more of this "fun" stuff given all the info already on here. Thanks to the earlier group for their posts and help, it was very useful.
Final reflections and notes for Carlos
The main problem with this project is to install the benchmark software, it requires some patience and quite a bit of time. Free benchmarking tools are out there, but the possibility for one of them to be plug and play (at least for ubuntu) may be a little slim. I really recommend to look at the readme file, follow its instructions and try to make sense off of it. This benchmark installation showed me how to install "real, or complicated to install" software on which the user has to configure files, download "patches" to make an instruction work, follow instructions and the fact that groupwork is the key on making any project a reality. On the other hand, I was able to modify the readme file (with the help of Juan) in order to accomodate the necessary instructions to get this project going.
Final reflections and notes for Jesus
Dear Future Jesus (or who ever is reading this, thinking it will help you get started) (First of all, how are you? Good. Anyway.. )
1. download connector, search for the jar file and put in the BenchMark/lib directory.
2. Mysql.properties <replace info, ok>: driver: com.mysql.jdbc.Driver conn:jdbc:mysql://<localhost>/<database> user: password:
3. If the java is installed correctly and the .sh script is still complaining, add "/usr" after the $JAVA_HOME in each of the .sh scripts.
- Also, add ../lib/<mysql connector name jar>: to the list of libs in each of .sh files.
README file, very useful.
- (add a space after numWarehouse =#.)
Well, that's it, I guess.
What have I learned? When all else fails, reinstall ubuntu. Seriously.
Instructions for building
Use of JDK 1.5 is recommended, build with "ant jar" from the command line of the base directory or use your favorite IDE such as Netbeans or Eclipse.
Instructions for running
The below scripts all use relative paths, but, they depend on JAVA_HOME environment varibale being set so that the correct runtime can be found.
JDBC drivers and sample "?.properties" files are included to make it extremely easy for you to test out the performance of EnterpriseDB, PostgreSQL, MySQL, Oracle, & SQL Svr in your environment.
(Created mysql.properties by copying oracle.properties and changing them to mysql)
1. Go to the 'run/scripts' directory, edit the appropriate "??????.properties" file to point to the database instance you'd like to test. Of course you'll substitute in the name of your appropriate config file in the command lines below.
(find java home in order to place the location at the first line of the runSQL.sh
2. Run the "sqlTableCreates" to create the base tables.
(run this command to see if everything works)
(also done before command worked:
- Downloaded connector/j from mysql website
- Extracted connector/j.jar file and placed on the lib folder
- Then, the name of the .jar file was added on the runSQL.sh by adding :../lib/filename in the run folder
Then the following command was ran, and it worked with mySQL
- ./runSQL.sh mysql.properties sqlTableCreates
Note: "sqlTableCreates" will truncate all the tables so you can start over clean.
There is also a "sqlTableDrops" script if you need it.
3. Run the "loadData" command file to load all of the default data for a benchmark:
A.) Approximately half a million rows in total will be loaded across 9 tables per Warehouse. (The default is numWarehouses=1) A decent test size of data totaling about 1 GB is 10 warehouses as follows:
$ ./loadData.sh mysql.properties numWarehouses= 1
(loadData.sh file was changed to have Java home and the connector/j file on it, like the runSQL.sh)
- I Did not use approach B*
B.) Alternatively, you may choose to generate test out to CSV files that can be bulk loaded as follows:
$ ./loadData.sh mysql.properties numWarehouses=1 fileLocation=c:/temp/
These CSV files can be bulk loaded into EDB-Postgres via the following:
$ runSQL EnterpriseDB.properties sqlTableCopies
You may clean out the data in the tables without dropping them via:
$ runSQL EnterpriseDB.properties sqlTableTruncates
4. Run the "runSQL" command file to execute the SQL script "sqlIndexCreates" to create the primary keys & other indexes on the tables.
- ./runSQL.sh mysql.properties sqlIndexCreates
5. Run the "runBenchmark" command file to execute the swing GUI application to test the database. Don't forget to set the number of warehouses equal to the number you created in step 3
- ./runBenchmark.sh mysql.properties
Another important observation is that over time the performance of processor decreases. I noticed this from the runs with only one warehouse in comparison to seven or more warehouses. Also I found out that the processor is more efficient when is the netbook is plugged in. You can tell by looking at how long time it takes to finish a complete run. For instance there are about 14 seconds difference (run with 3 warehouses) between the netbook plugged in and not (this without the computer going to sleep). Also, after several tries of running the software (with dropping tables) the system crashes. This happened to me twice.
Overall, I really enjoyed this project. It is nice to know you can exceed the Operating system resources and make the operating system crash by executing legal operations. However, the systems we are using are just netbooks.
Goal for this week: Run as many tests of MySQL / Ubuntu / PostgresQL / Windows as we can!
Using BenchmarkSQL. If we can get another benchmarking application going, too, that would be great.
Tests to run:
How long does it take to load 1, 2, 3, 4.... ?? warehouses into MySQL? Measure in minutes/seconds. Measure for the netbooks until something dire happens. Measure on Jesus's virtual Ubuntu-on-a-Mac, but stop short of doing anything dire to the Mac. Remember to drop all tables before starting a new
Loading the database tests:
|Whose Netbook?||Who ran the test?||How many warehouses?||Wall or battery power?||Run time||Start time||End time||rows loaded||rows per second||Comments|
|Ubuntu/Asus IEEE||Carlos||3 (MySQL Ubuntu)||battery||418 seconds||10/15/10, 15:15:51||10/15/10 15:36:46||1,597.694 rows||1273 rows/sec||Forgot to drop tables, this may be the reason why it took so few time|
|Ubuntu/Asus IEEE||Carlos||3 (MySQL Ubuntu)||battery||1,137 seconds||10/20/10, 18:33:04||10/15/10 18:52:02||1,598,240 rows||1405 rows/sec||This time tables were dropped and time is average|
|Ubuntu/Asus IEEE||Carlos||3 (MySQL Ubuntu)||wall||1,123 seconds||10/20/10 13:39:24 2010||10/20/10 13:58:08||1,596,404 Rows||1421 rows/Sec|
|Ubuntu/Asus IEEE||Carlos||7 (MySQL Ubuntu)||wall||2,555 seconds||10/20/10 13:59:35||10/20/10 14:42:10||3,594,128 Rows10/2010 14:42:10||1406 Rows/Sec|
|Ubuntu/Asus IEEE||Carlos||7 (MySQL Ubuntu)||battery||2,567 seconds||10/20/10 18:55:57||10/20/10 19:38:44||3,594,433 Rows||1400 Rows/Sec||Once again, running on battery was slower and rows/second decreased over time.|
|Ubuntu/Asus IEEE||Carlos||20 (MySQL Ubuntu)||battery||7,228 seconds||10/21/10 15:30:32||10/21/10 17:31:01||10,078,571 Rows||1394 Rows/Sec||I believe the battery life has something to do with the perfomance.|
|Ubuntu/Asus IEEE||Carlos||100 (MySQL Ubuntu)||wall||35,451 seconds||10/21/10 18:43:11||10/22/10 04:34:02||49,995,076 Rows||1410 Rows/Sec||To test previous my statement, ran 100 warehouses and found out that the decrease in performance is only when running on battery (Can not run this test using battery power).|
|Ubuntu/Asus IEEE||Jesus||5||wall||419 seconds||10/24/10, 17:51:52||10/24/10 17:58:52||599099 rows||1429 rows/sec|
|Ubuntu/Asus IEEE||Jesus||10||wall||491 seconds||10/24/10, 18:01:00||10/24/10 18:09:12||598329 rows||1219 rows/sec|
|Ubuntu/Asus IEEE||Jesus||15||wall||416 seconds||10/24/10, 18:14:34||10/24/10 18:21:30||599024 rows||1219 rows/sec|
|Ubuntu/Asus IEEE||Jesus||20||wall||414 seconds||10/24/10, 18:26:11||10/24/10 18:33:05||599025 rows||1446 rows/sec|
|Ubuntu/Asus IEEE||Jesus||25||wall||418 seconds||10/24/10, 18:34:57||10/24/10 18:41:55||599651 rows||1434 rows/sec|
|Ubuntu/Asus IEEE||Jesus||30||wall||413 seconds||10/24/10, 18:48:10||10/24/10 18:55:04||598677 rows||1449 rows/sec|
|Ubuntu/Asus IEEE||Jesus||5||battery||418 seconds||10/21/10, 17:23:20||10/21/10 17:30:18||599036 rows||1433 rows/sec|
|Ubuntu/Asus IEEE||Jesus||10||battery||3624 seconds||10/21/10, 10:35:58||10/21/10 11:36:23||5089366 rows||1404 rows/sec|
|Ubuntu/Asus IEEE||Jesus||15||battery||416 seconds||10/21/10, 16:12:45||10/21/10 16:19:42||599789 rows||1441 rows/sec|
|Ubuntu/Asus IEEE||Jesus||20||battery||7209 seconds||10/21/10, 11:48:09||10/21/10 13:48:18||10080749 rows||1398 rows/sec|
|Ubuntu/Asus IEEE||Jesus||25||battery||417 seconds||10/21/10, 17:36:43||10/21/10 17:43:40||598002 rows||1434 rows/sec|
|Ubuntu/Asus IEEE||Jesus||30||battery||415 seconds||10/21/10, 18:29:52||10/21/10 18:36:48||598251 rows||1441 rows/sec|
Running the TPC-C-like benchmarks:
|Whose Netbook?||OS used||DBMS used||Who ran the test?||How many warehouses?||Number of Terminals||Wall or battery power?||Total seconds to run test||start time||end time||rows loaded||rows per second||Comments|
|Ubuntu/Asus IEEE||Obuntu||mySQL||Manuel||1||1||bat~50%||410 sec||10/15/10,15:35:36||10/15/10,15:42:27||598810||1460|
|Ubuntu/Asus IEEE||Obuntu||mySQL||Manuel||2||1||wall||786 sec||10/18/10,14:42:55||10/18/10,14:56:02||1097527||1396|
"Pathologies of Big Data", Communications of the ACM, August 2009. Read this by next week. You might need to be on a campus computer to access the full text of the article.
Transaction Processing Council -- we are going to try to get one of their benchmarks running.
Report on a TPC-H test someone ran on MonetDB. Included here to give you an idea of the kind of report we might want to generate. Their results are, now, old, so our results would be fresher.
PostgresQL - another free DBMS we could compare to MySQL.
Oracle 10g Express DBMS -- a free, starter DBMS product from the giant DBMS company, Oracle
BenchmarkSQL open source software for running TPC-C on a variety of DBMS products.
List of database benchmarking tools, plus DBMonster. On the list, the "Open Source Development Labs Database Test Suite", BenchmarkSQL, Bristlecone, Java TPC-W, TPCC-UVa and Hammerora look most promising. Sourceforge seems to have more free DBMS benchmarking tools, too.
Try to get a TPC benchmark running against MySQL on your netbooks. Any TPC benchmark will do.
Keep track of:
- Useful commands to the computer. Ex. "mysql -h hostname -u username -ppassword", for starting mysql up at the command line in Ubuntu.
- Useful web sites, useful software
- Points where installing or using software got hard/confusing, and how you got around the problem (if you do) -- this tells us where software is not ready for "Joe Public" to use it, which is valuable information for improving the software.