LargeDataSets DBMSTesting September

From Wiki
Jump to: navigation, search

Archive of work in DBMS Testing with September group -- Roberto, Juan and Max


What we want to fill in:

DBMSOSdatabase sizequery identifierruntime
MySQLUbuntu???particular TPC-H query???? seconds


9/24/10

Final reports, sum up. What you learned this month, what you wish you'd known at the start of the month, any technical details you might want to know in the future and will forget in the next couple months.

         Roberto:

While there are things that I learned during the first month of the HPCRC project, there are also those things that I had already known, but were reinforced with the project, and things I would have liked to have known at the start of the project.

The first thing I went about doing was installing MySQLand a version of DB2 Personal Edition on the Windows side of the netbook. I thought it would be cool to have MySQL on both the Ubuntu and Windows sides. I thought we could try benchmark tests on the same database software on different operating systems. Though another factor in my decision to do so is the fact that I feel more comfortable installing software on Windows than on Ubuntu. I find that software installation tends to be much easier on Windows, such as Oracle Express Edition and Hammerora which were easy to install on Windows. The MySQL worked, but issues came up with the DB2 software (see below).

During the weekend of September third, I downloaded a zip file for Bristlecone (I have it on both operating systems, so I might have downloaded it for both that weekend), and unzipped it. I failed to get something going with that due to several reasons. One of which was the fact that a relative’s wedding took place that weekend and a family of relatives stayed over. At some point I tried following Juan’s instructions for the TPC-H benchmark (extract files, create makefile, etc), which generated some .tbl files for me. I then followed Maximino’s instructions for loading the .tbl contents into MySQL. I then attempted to run some queries, but encountered an error similar to what Juan also encountered:

using 1284681633 as a seed to the RNG Open failed for ./3.sql at qgen.c:170

Afterwards, when things didn’t seem to be progressing with that, I proceeded to install Oracle and Hammerora, which seemed to show promise, judging by Juan’s postings. I followed the instructional PDF file and created a test schema. I followed the instructions until task 4 on page 21 of the PDF file. After Dr. Thomas announced that she had been able to get Benchmark SQL to create tables in ubuntu, I then proceeded to try that myself. I loaded up a single warehouse, which took 358 sec.

Entered the command

./runSQL.sh mysql.properties

This came up often:

-- SQL Runtime Exception

-----------------------------------------

DBMS SqlCode=1072 DBMS Msg=

 Key column 'c_w_id' doesn't exist in table

-----------------------------------------------------------------

I had no issues with the GUI. I did NOT alter the warehouse number values as Juan mentioned in one group meeting. The GUI did not freeze up, though the terminal seemed to have done so.

Not sure when I did so, but I also downloaded DBMonster on both operating systems. I did not get anything done with that. I attempted to download ODBC, but the download was not successful. The company associated with it did bug me with some emails afterward wanting to know about my opinions on the software or something like that.

Overall, I found that I was the group member making the least amount of progress. Others took initiative and tried out various pieces of software. If felt intimidated and achieved very little early on in the project. Afterwards, I found myself I mostly playing catch up and following the instructions the other members shared on the wiki page. Rather than struggle on something different from the rest, I tended to work on what others were seeming to have success with. I wish that we would have seen sooner that the TPC-H DBGEN stuff was not going to be successful. We should have also thought about how it was supposed to access the database without the user’s information or name of the database. It might have been useful for members to meet even if not all members could be present. What I mean is if there happened to be a time, aside from the regular group meetings, in which two members could have met, then it probably should have been done. Those members should then inform the rest on any new information, so that no one is left out of the loop. It also would have been nice to have tried to get stuff going during the group meetings. I felt the whole thing was mostly people being “lone wolves” trying to get something working, then sharing some info as to what they were able to accomplish. I feel we could have been more team oriented.

P.S. This isn't relevant, but this wiki editor and formatting is incredibly annoying. I deleted lots of stuff, because I could not format it the way I wanted. If only I could upload a .doc file.

Suggestions

• Be a team.

• Keep each other up to date.

• Store passwords. Keep them in a file and possibly email them to yourself.

• Take screenshots during installations (Can also be used to store passwords)


         Juan: What I Learned This Month...

If one program/benchmark does not work the way it is suppose to work, TAKE A BREAK or MOVE ON. Think about things and if you really have to, move onto the next batch of programs to experiment with. Read all instructions CAREFULLY and talk things through with your team members. That is why we are all here. To learn from each other. I found that an area I became stuck in, was exactly an area that my partner was able to get through, no problem.

Take notes where you have stopped with your current research. Write or type everything down because you will forget some little details here and there. Write down everything from simple user names and passwords, to the directory where you can find your current running java. Choose a program/benchmark that will work with modern databases that you are currently working with. If you are getting many syntax errors with any scripts, make sure that the code in the scripts are up to date with current versions of that software or program.

One of the problems that came up was staying with an umpromising benchmark for a long amount of time. We probably should have tried other benchmarks or another OS quicker. Also, I found some problems to just fix themselves. What I mean is, as I moved on to another benchmark, I started to work on another benchmark's instructions and README files. I learned more information about how some certain things worked and was able to take this new information and apply it to the earlier benchmark that I got stuck on. Just because you are stuck on one benchmark and can't get it to run, doesn't mean it's the end of the world. There are many benchmarks out there. There are many ways to do a benchmark test, you just have to find the one that works best for you.

Don't try to do too much on your own. BE A TEAM and use those extra five minutes to talk about things with other people in your group. DON'T TAKE TIME TOGETHER FOR GRANTED. There were times in the first couple of meetings/weeks when we all could have benefited as a group and worked together in the lab for another half hour after meeting and got pass many road blocks.

Just have fun. It's real world research. There will be road blocks.

Still trying to figure out a name. Maybe... "D.B.MinES or DBMinES"?? (Database, Mining and Expert Simulations) Sounds like "Dee-Bee-Minds". Just a thought.

         Max:

One of the things that I learned this month was actually re-learning that sometimes it is better to seek a new solution to a problem rather than to try to patch a buggy/partial solution. Our intention was to get TPC-H going and test how much workload it would take to “break” MySQL or our netbooks. After going through the hellish nightmare that it was to create dbgen and qgen, we realized that both were obsolete. I tried to make them work (wasted a lot of time instead of looking for another solution), alas, I could only get query number three to work. It took 49 minutes for MySQL to run the query on a data set that was a tenth of the size of the minimum required for any of my findings to be valid. This approach did not shed much light onto the transactions per minute or kept track of time.

I wish we had known at the start of the month that a more successful path awaited us if we tried other solutions such as Hammerora. If we had taken that path, we would have advanced much faster which was critical considering that we had a very limited time frame. The only technical details that I deemed significant enough to write down so I do not forget are as follows:

• Read all the documentation provided with the software.

• Inspect the data to be used in the testing.

• Document your work. Write down stuff like commands and other details as you work; this is very important because it will allow you to recreate an event.

• Gather different sources on the subject; you will get a better picture of that which you are trying to understand. Even if the authors explain the exact same subject, the differences in the approach they take might help you understand the subject better.

. Build a strong teamwork atmosphere.

9/17/10

Dr Thomas says: there are video cameras available for us to borrow, if you want to make the YouTube video we discussed for your project.

Juan - syntax problems keeping him from running TPC-H with MySQL on Ubuntu. Thinks he can get it going over the weekend.

Roberto - can create tables. Also working on TPC-H/MySQL/Ubuntu.

Tables created. Filled with data. Running queries is the sticking point.

Max - couldn't make the meeting.

Ok, we really do need to get a benchmarking tool going. It does not look like the (very old) TPC-H can be got going at all -- the original code is pre-MySQL, so some of the qgen assumptions about how to structure queries don't really work. In particular, there does not appear to be a way to tell qgen about a valid user name and password so that it can successfully connect to the local MySQL database server. (In the old days, people didn't protect databases with passwords as carefully as they do now.)

So we really need to get a benchmarking tool going. Something someone else has built, that is modern enough to know about MySQL, passwords, etc. DBMonster, BenchmarkSQL, Bristlecone. All appear to hinge on getting the JDBC parameters for MySQL correct. During our meeting, we did find that Bristlecone comes with some sample JDBC/MySQL properties files, which should show us at least the right syntax.

So, GET A BENCHMARKING TOOL GOING. Give it a little effort every day, and document your progress here so everyone can see. (And, if possible, use your advice to skip past problems that puzzled you.)

We want to fill in the table at the top of the page with a bunch of results!

Dr Thomas's updates:

       weekend:  

If it is relevant, there are a few books in the Safari Books Online collection with introductions to JDBC in them. Chapter 22 of "Java: A Beginner's Tutorial" and Chapter 4 of "Core Java™ Volume II–Advanced Features, 8th Ed" both cover JDBC (Java / DBMS connectivity). To access Safari Books Online, you have to go through the CSU Library web site (look for "databases" pertaining to computer science). Our library pays for access to the full Safari Online collection of books.

       Monday:

Huzzah! I got BenchmarkSQL to create tables in my Ubuntu/MySQL installation. Still have an odd java Exception (MySQLNonTransientConnectionException) to deal with, but, yay! The problem seems to be that BenchmarkSQL is installed with JDBC drivers for some DBMS engines already stored in BenchmarkSQL's lib directory. But not the JDBC driver for MySQL. So I downloaded Connector/J (Google "jdbc mysql driver" and it will be the first item in the search results). Put the blah.tar.gz file I downloaded (generic download, for Ubuntu) into a temporary directory; typed "gunzip filename.gz" then "tar xvf filename", which extracted a bajillion files. I only wanted the one whose name ended in .jar. I copied that file into the BenchmarkSQL/lib directory.

The first command the BenchmarkSQL README file recommends running is the 'runSQL.sh mysql.properties sqlTableCreates' command, which - with the runSQL.sh file edited to include the MySQL JDBC driver into the java command line -- now almost works. Creates the tables, then pops out an exception.

I'm not sure if it matters, but I started up MySQL from the command line in another Terminal window before trying out BenchmarkSQL commands. Not sure if BenchmarkSQL is smart enough to start up the MySQL server on its own. After editing the loadData.sh file to add in the location of the MySQL driver, it appears to be happily loading data. Default database size ("numWarehouses=1"). I hit a formatting error when I tried to set the number of warehouses to 10 -- looks like their readme.txt file may not be specifying exactly how to write that command line. (A little eyeballing of the Java source might find me the right format to change warehouse size.)

BenchmarkSQL now running, on Ubuntu, with MySQL, creating tables and loading data. Good enough progress for one evening.

       Tuesday:

Whoops, sorry. Somehow I thought I'd already shared the mysql.properties file. Mine says:
driver=com.mysql.jdbc.Driver
conn=jdbc:mysql://localhost/mydatabasename
user=myusernamewithpermissionstomydatabase
password=myuserspassword

The mysql.properties file goes in the BenchmarkSQL run directory. And I'm having to add ":../lib/mysql-connector-java-5.1.13-bin.jar" to every .sh file before it works. (lib is where I put the .jar file for the MySQL JDBC driver I downloaded.)

I did find out I could run BenchmarkSQL stuff without having mysql running in another Terminal window. (Which wasn't too surprising, since my "does JDBC work" test program was able to successfully connect to the MySQL database when I had not started mysql at the command prompt. But I was extra-cautious last night.)

Oooo. Evidently, once the tables are created and loaded, we get to use a wee little GUI to run the tests... A wee little GUI with no Help menu. Hmmmm.

Either my netbook is very slow, or there is a problem running a query. Given that Max ran a query for 3 hours, I'll give the GUI more time.

       Wednesday:

My transactions never did finish. Did the netbook time out due to no activity before they could?

I don't know. I did find out the "kill all terminals/transactions" button the BenchmarkSQL GUI is a hair buggy, since it didn't die on command. MySQL seems to have frozen up and my netbook hard drive was starting to make whining noises, so I rebooted. Listen to your netbooks -- if the fan/hard drive starts making a lot of noise, it may be time to kill a program or force a reboot.

       Thursday:

I found out why I could not create 10 warehouses as the README file implied. (Though, since my tests did not run to completion with only one warehouse, this may be useless information.) The source code for LoadData.java expects the number of warehouses to be the next *argument* to the program, not the last few characters on the end of the numWarehouses String.

So, the README file should say:
loadData mysql.properties numWarehouses= 10
to create 10 warehouses. Space is needed after the = sign!

I'm now loading 2 warehouses. I may wind up deleting this paragraph if the load fails.

It worked. Took about 13 minutes. (781 seconds)

Hmmm. Wonder how long it would take to load, say, 4 warehouses. Twice as long?

Yes. 1566 seconds, or 26 minutes. With interesting, intermittent hard drive/fan noises. 2257 sec (37 min) to load 6 warehouses. 2966 seconds (49.4 minutes) to load 8 warehouses. Linear scaling of load times, so far -- that is good, in Efficiency World.


Juan's updates:

       weekend: 

I gave up on Bristlecone for a just a few hours and had a last second decision to try another benchmarking tool. I was looking at Hammerora directions and they looked so much more user friendly to do them on the windows side. So late Friday night, I decided to try Hammerora on the windows side and was able to get some tpc-c stuff going together with the oracle express edition database and mysql. I found out that using the windows side was much more user friendly with some nice GUI and much easier to install oracle and mysql.

If you go to the windows side, log onto the dbms wiki, look for the Hammerora link in the list of benchmarking tools and follow the pdf of directions on the site. After following the directions on the PDF from Hammerora, you can get pretty far with using an oracle database in conjunction with Hammerora and mysql. Hammerora is very useful because you can set your own parameters, create your own virtual users, have your data uploaded to the oracle website, run tests (TPC-C and TPC-H), and look at your results on their website. There is an option to use tpch-h for oracle, but i have not figured out how to use that yet. You'll know what i'm talking about if you install Hammerora.

Since all of my databases were located on the ubuntu side, I was not able to run tests on them. But i Have an idea. I was thinking i could just copy my dbgen .tbl files and sql scripts on a microsd card that i have and load them into mysql on the windows side. Then i could test it with oracle xe and hammerora on the windows side, upload the result to the oracle db and mysql db and look at the results. I haven't done that yet, but i hope to get that going by the end of monday. I feel i have made some progress on the windows side, but i'm not going to stop there.

Now, I've logged back into the ubuntu side and found out how to install hammerora and the oracle xpress edition (very tough for linux systems to get going by the way, for some reason it's always been like this) I'm still looking at Bristlecone and going to try to figure that out. But for now at least, I'm pretty sure I have some TPC-C stuff going with Hammerora on the windows side. By the way, Hammerora directions made things very easy and simplified the testing process very much. If you have time, try to use Hammerora on the windows side. If you have any questions, just email or text me. 209-604-3621

(9-19-10 12:21 AM)

       Monday:

Got off work late...looked at the Wiki and gonna try to get something going late tonight or early tomorrow. (9-20-10 11:30 PM)

SUCCESS!!! Well I stayed up a little late and was able to get some tables created with BenchmarkSQL...HOORAY!! Thank you Dr. Thomas! BTW, creating that mysql.properties was a pain in the a**! Anyway, I got it to work and I'm gonna work some more on it tomorrow, or I guess later today. Thanks!! (9-21-10 1:58 AM)

       Tuesday:

Nothing to report.

       Wednesday:

Nothing to report.

       Thursday: 

Loading warehouses and playing around with the BenchmarkSQL. No hard drive sounds yet. Something different between Dr Thomas' computer and mine?? I'm pretty sure she is using Ubuntu on Eee PC.

Loaded one warehouse and it took 478 seconds or about 8 minutes.

Loaded 2 warehouses and it took 898 seconds or about 15 minutes Rows Loaded = 1098267 Rows Rows Per Second = 1223 Rows/Sec

Now I've typed in the next command "./runSQL.sh mysql.properties sqlIndexCreates" in my terminal, started the GUI for BenchmarkSQL with "./runBenchmark.sh mysql.properties, clicked on the Terminals tab and made sure that the number of warehouses matched the number of warehouses that you specified earlier with "loadData EnterpriseDB.properties numWarehouses= 10".

Clicked on the Controls tab and clicked Create Terminals.

It should have been successfull and now you are ready to run some transactions. Run Transactions tab should be clickable now, so click on it and let it run. BTW, in the GUI screen it will tell you where it is creating a txt file for you to look at your results. They should be located in BenchmarkSQL/run/reports.


Roberto's updates:

       weekend:

Looked through the README files and some of the text files.

       Monday:

Installed Oracle Express Edition & Hammerora on Windows.

       Tuesday:

Downloaded and extracted BenchmarkSQL and JDBC files. Don't quite grasp as to how to proceed from here.

       Wednesday:

Created the correct mysql.properties file. Still can't get it to work. (I have the .jar file in the lib directory and the path to it in the .sh files)

       Thursday:

Dr. Thomas got BenchmarkSQL working for me. Loaded a single warehouse, which took 358 sec. Then ran ./runSQL.sh mysql.properties sqlIndexCreates, followed by ./runBenchmark.sh mysql.properties. There were several error messages throughout the process, but it seemed to work fine other than that.

Max's updates:

       weekend: I wasted all weekend driving on I5 trying to solve a particular problem.
       Monday: The query query generated by qgen using template 3 seems to work; maybe we could use that.
               I will work on it tonight and post my findings tomorrow.
       Tuesday: I am generating new tables (smaller tables) so I can use them in both the ubuntu and windows Mysql
                versions. I will have to input the query by hand in both cases. This time around I will keep track
                of the loading and search times.  
       Wednesday:  Thinking that the lineitem table was too big I reduced the SF to .1. I get 600000 rows 
                 instead of 6000000. I tried running the query again; it ran faster but with the same output 
                (empty set). It turns out that the l_shipdate column had all dates set to 0000-00-00. The query 3
                (Shipping priority) looks for 10 unshipped orders (of the highest values) within a specified time-
                 frame randomly generated (usually in the 199X's ). Since the dates in the table were 0000-00-00, 
                 the return was always the empty set. I have to figure out what went wrong when generating the 
                 table. ....almost forgot... I got the mysqlclient talking with the server. Now I just need the
                 table fixed.
       Thursday:

9/10/2010

Allllllmost there... Working by next week, please. Then we test and document our results.


9/3/2010

List of database benchmarking tools, plus DBMonster. On the list, the "Open Source Development Labs Database Test Suite", BenchmarkSQL, Bristlecone, Java TPC-W, TPCC-UVa and Hammerora look most promising. Sourceforge seems to have more free DBMS benchmarking tools, too.

Task for next Friday: get a free benchmark tool working. Any one will do, running any TPC-?-like benchmark. We just want to get to the benchmarking and testing part of our program.

Tentative plan: get a benchmarking tool working. Then each student picks a (free) DBMS product and uses the benchmarking tool to hammer the product and see what happens. If we all use the same tool and the same operating system, the tests will be a comparison of the DBMS products.

When you get a benchmarking tool working, EMAIL THE REST OF US. And post how you got it working, and what settings or command prompts or what have you, to this wiki, so we can all get it working, too. Update this wiki with what product you're currently working on and, if/when you give up and start working on a different tool, update the wiki. That way, if two students are trying to get the same suite going, we'll know. (So you can help each other.)

Roberto installed MySQL on the Windows side of his netbook. Thought it'd be nice to have it on both Ubuntu and Windows. Also installed trial version of DB2 Personal Edition, though it seems to not have installed properly.DB2 error.png


I also receive the second error when I boot into Windows.

DB2 boot error.png

Possible fix: [http://dotbay.blogspot.com/2009/06/db2javit-rc-9505-sql5005c-system-error.html this blog seems to have a fix for the DB2 SQL5005C system error]

I have downloaded a zip folder for bristlecone and unzipped it, but in my ignorance do not know how to use it. The weekend was not very productive, as there was a wedding, and I had relatives over. I followed Juan's instructions and entered the "make" command. I now have some .tbl files.

9/15/10: Piped in the .tbl files into a MySQL database (lineitem.tbl took a while). Will attempt next step when I have time to allow a large query to run.

9/16/10: Encountered error similar to Juan's: -- using 1284681633 as a seed to the RNG Open failed for ./3.sql at qgen.c:170

9/17/10: Simply pasted the query Max has on the wiki. It returned an empty set.

Juan is working on the TPC-H Benchmark now. I've given up trying to get BenchmarkSQL working with mysql. There are many benchmarks to work with, so there will be one that should work. I'm hoping it's the TPC-H benchmark. (9-4-10)

I may have something here...

Instructions for starting up DBGEN and QGEN for use with TPC-H? I got something to go, but I don't know what yet.

1) Go to TPC webpage and click on TPC-H link on the left.

2) Download the DBGEN file under the TOOLS section. Choose whichever file is appropriate for your system. For my Ubuntu OS I downloaded the tar.gz file.

3) When the download is complete, put the tar.gz file somewhere you can find it. When you have it where it's easy to locate, extract the files there.

    NOTE: This was the easy part of the process. Now this next part took me a lot of trial and error. Reading the right material and being at the right directory at the right time helped very much.


4) Ok. Now you should have a seperate TPC-H folder. Open it and look for the makefile.suite file. Make a copy of this file and paste it anywhere in your current directory and call it makefile.

5) I would recommend also opening a few helpful files. Open the README, PORTING.NOTES, your makefile and config.h. These are all located in the TPC-H folder you extracted from the tar.gz and help very much.

6) The main thing here is you must edit your makefile in order to get the DBGEN and QGEN to start up. Open your makefile and you should see a bunch of revisions. Scroll down until you reach "CHANGE NAME OF ANSI COMPILER HERE".

7) I am going to paste what the makefile looked like before and what it looked like after I made the appropriate changes to get things going.

START BEFORE:

CHANGE NAME OF ANSI COMPILER HERE

CC =

Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
                                 SQLSERVER, SYBASE, ORACLE
Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS, 
                                 SGI, SUN, U2200, VMS, LINUX, WIN32 
Current values for WORKLOAD are:  TPCH

DATABASE=

MACHINE =

WORKLOAD =

CFLAGS = -g -DDBNAME=\"dss\" -D$(MACHINE) -D$(DATABASE) -D$(WORKLOAD) -DRNG_TEST -D_FILE_OFFSET_BITS=64 LDFLAGS = -O

The OBJ,EXE and LIB macros will need to be changed for compilation under
 Windows NT

OBJ = .o

EXE =

LIBS = -lm


END BEFORE

    NOTE: The only things I needed to change were the following fields: CC, DATABASE, MACHINE and WORKLOAD. You can find all of choices available for these fields in the comments. Choose which applies to you. If you want to know more details about the different choices available, now is the time to look at the config.h file that you should have opened. Now take a look at my AFTER makefile example. NOTE: ACTUALLY...the CC field does not have any options. You just have to figure out which compiler you have. For myself I got lucky. I googled GCC and found out that for most code it's not capitalized and typed in lowercase as gcc. I had many problems with a capitalized GCC, so you may want to try and lowercase the CC field.

START AFTER:

CHANGE NAME OF ANSI COMPILER HERE

CC = gcc

Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
                                 SQLSERVER, SYBASE, ORACLE
Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS, 
                                 SGI, SUN, U2200, VMS, LINUX, WIN32 
Current values for WORKLOAD are:  TPCH

DATABASE= SQLSERVER

MACHINE = LINUX

WORKLOAD = TPCH

CFLAGS = -g -DDBNAME=\"dss\" -D$(MACHINE) -D$(DATABASE) -D$(WORKLOAD) -DRNG_TEST -D_FILE_OFFSET_BITS=64 LDFLAGS = -O

The OBJ,EXE and LIB macros will need to be changed for compilation under
 Windows NT

OBJ = .o

EXE =

LIBS = -lm


END AFTER


8) DON'T FORGET TO SAVE YOUR MAKEFILE! Now, if you don't already have a terminal window open, open it now. CD to your TPC-H folder and make sure you are able to LS the directory and find your makefile.

9) After you have made the specifications to the makefile appropriately, you should be able to just type the 'make' command in your terminal. After the makefile is done compiling you should have DBGENerated .tbl files in your folder.

THATS ALL I HAVE FOR NOW. GOOD LUCK (9-5-10)


I am having some issue with the qgen right now. I keep getting this error message...

juan@ubuntu:~/Largedatasets/TPCH$ ./qgen -- using 1284501260 as a seed to the RNG Open failed for ./1.sql at qgen.c:170

I've looked in the qgen.c file and found line 170, but everything looks fine to me. Maybe there is something I am missing?? If you have any suggestions, that would be appreciated very much. It's very frustrating when you think you have done everything right and the testing doesn't cooperate. Thanks! (9-13-10 @ 4:57PM)

Ok, after reading Max's directions over and over again, I finally realized what I had to do and what I may have been doing wrong. The copy and past into mysql was throwing me off, but now I'm back on track. Now I can test and hopefully get something going by the end of today or even before the meeting.

(9-17-10 @ 10:47 AM)


Max is working on...

As I tried to get the Qgen to work, my frustration grew exponentially. I decided to try to create a "simple Qgen" that I can use. I learned that PHP might be a good pairing for MySQL. I picked up a book on Mysql and PHP ("MySQL+PHP") and will try to "imitate" the Qgen. (I would also love to meet at noon if at all possible, for my last Friday class ends at 11:50)


(2010*09*06 18:00)


Picking up from where Juan left...

Executed dbgen from the TPC-H directory with command ./dbgen

Eight tables were created (Note that customer and another table have access restrictions; I made them readable/writable)

Then used the script we used during the workshop to pipe the tables into a MySQL database.

Check tpch2.12.0.doc to see the layout for the different tables.

For the region table, the layout looks like this:

create table region ( R_REGIONKEY integer, R_NAME char(25), R_COMMENT varchar(152)); load data local infile '/home/max/Documents/TPC-H/region.tbl' into table region fields terminated by '|';

Note: the name of the tables should be in lowercase letters.

Once the tables have been "piped" into the database, run ./qgen [number] (where number specifies the template number to be used to generate the MySQL query.)

with ./qgen 3 you should get something like this in the terminal:

max@ubuntu:~/Documents/TPC-H$ ./qgen 3 -- using 1284085826 as a seed to the RNG


select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-04' and l_shipdate > date '1995-03-04' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate; set rowcount 10 go

Copy and paste this query into another terminal that is running MySQL....

Wait for MySQL to finish the task....

When you get the results, it looks something like this:

Empty set (3 hours 55 min 57.33 sec) <----- I started at about 19:20 yesterday but had to go to work at 21:30, so I am not sure about how long it actually took....

Note: ./qgen 1 generates a query that MySQL does not understand.... I am not very familiar with MySQL sintax, so I could not fix it.

At any rate, my usage of qgen is very "unrefined" so to speak.

(2010*09*10 10:19)


So far only query #3 has been useful.... MySQL yells with the others I have tried so far.... It says that the syntax is wrong. I have learn MySQL faster....

(2010*09*10 10:42)




Dr Thomas is working on getting the BenchmarkSQL tool to find the JDBC driver on the Ubuntu netbook.

Grad school panel next week at 4pm. So we may need to start our meeting (Friday afternoon was strongly favored by Juan as a meeting time) and, if we're still going at the 40 minute mark, we'll go to the grad school panel and resume our meeting afterward. Dr Thomas is one of the panelists.

Links from discussion on 8/27/2010

"Pathologies of Big Data", Communications of the ACM, August 2009. Read this by next week. You might need to be on a campus computer to access the full text of the article.

Transaction Processing Council -- we are going to try to get either the TPC-H or TPC-E benchmarks going. Note the "Tools" on the right hand side of the TPC-H and TPC-E pages.

Report on a TPC-H test someone ran on MonetDB. Included here to give you an idea of the kind of report we might want to generate. Their results are, now, old, so our results would be fresher.

PostgresQL - another free DBMS we could compare to MySQL.

Oracle 10g Express DBMS -- a free, starter DBMS product from the giant DBMS company, Oracle

BenchmarkSQL open source software for running TPC-C on a variety of DBMS products. Dr Thomas hasn't got it working quite yet -- something about JDBC (Java to DBMS communications). But maybe you can get it working.

Try to get a TPC benchmark running against MySQL on your netbooks. Any TPC benchmark will do.

Keep track of:

  1. Useful commands to the computer. Ex. "mysql -h hostname -u username -ppassword", for starting mysql up at the command line in Ubuntu.
  2. Useful web sites, useful software
  3. Points where installing or using software got hard/confusing, and how you got around the problem (if you do) -- this tells us where software is not ready for "Joe Public" to use it, which is valuable information for improving the software.
  4. There was something else in the list of things Dr Thomas said we should keep track of... what was it?