Friday, July 29, 2011

Oracle ACE Award - Baron Schwartz

A big congrats to Baron Schwartz for his Oracle ACE Award.

Baron is extremely respected in the MySQL community. He is the Chief Performance Architect at Percona, an independent services provider for the MySQL database server. He is the lead author of High Performance MySQL, and has created several popular open-source software projects for MySQL, including Maatkit.

Thank you Baron.

Tuesday, July 26, 2011

Oracle Provides Early Access to MySQL 5.6 New Features

Just in case you missed the press release:

Oracle Provides Early Access to MySQL 5.6 New Features



At OSCON (the O’Reilly Open Source Convention) in Oregon, Oracle announced that early access to new features of MySQL 5.6 is available for the community to test, deploy and provide feedback.  Read more here

Wednesday, July 13, 2011

Take the time now for gains later.

Regardless of which data warehouse paradigm you follow or have heard of, Kimball or Inmon. We should all agree that the data warehouse is often a requirement for business. Different people want different things and they all want it from your data. The data warehouse is not a new concept and yet they are over looked at times. A warehouse is never complete, it is an evolving entity that adjusts with the requirements it is given. It is up to us to make sure that the access to enterprise data in an accurate and timely manner is easy and the standard. MySQL can handle a data warehouse perfectly.

MySQL databases are designed in numerous ways, some good some bad. A warehouse can take that data and organize it for the best use of others. What concerns or issues do you often hear when it comes to gathering data from your database? It is easy for all of your developers to query and get the same data? How many ways does your company slice and dice data? Who is your target audience for all of this data? Who wants just the important data? How well do you keep your results accurate?

If these are questions you can relate too but you do have not a warehouse then it is time to consider one. They can be considered to be, to big of a project for the current resources, they never work , poor data quality, always has data inconsistencies. These all are common complaints but those are not complaints of the warehouse but more of the execution behind it.

Use the tools available to you to keep the data updated. ETLs (extract , transform, load) are a requirement of course with data warehouse tables, use them to your advantage. Do not get lazy or let others get lazy. Fix data at the source and re-execute the etl. Do not let “quick fixes” that are not done at the source level. Remember, you do not have to back up the warehouse like you do your source, you can always rerun the ETLs.

Be ready to have eager users try to access your data before your ready to release. Be prepared for complaints. Keeping it accurate and easy to access is the key overall. Have it serve a need first, do not solve everything at once but make an impact with good quality fast data. As I said before a warehouses work is never done, you will continue to add data and available slices of data. Take the time to execute correctly now to have long lasting accurate data.

I am a very big supporter of data normalization but a warehouse also has a place and you can have big performance gains from using it.

There is a lot of resources on the net to help you with this:




Tuesday, July 12, 2011

GOSCON

Innovation Nation & GOSCON 2011


GOSCON returns to Washington D.C. on August 23, 2011

The list of speakers is available here: http://goscon.org/speakers.

I will be at the conference this year in support of MySQL and I am eagerly awaiting to meet everyone.
So please stop by and say hello ! 

Tuesday, July 5, 2011

Stored procedures and Triggers


Having multiple ways to achieve a task is something we all enjoy as developers and DBAs. We find, develop and learn new ways to do things better and faster all the time.

At the risk of starting a debate, I am curious on others opinions or practices when it comes to Stored Procedures and Triggers. To use them or not versus code based functions ? Best case use versus worst case use? There is no real wrong answer here as it depends on your development application. Certainly some lean one way over another and there are more than enough valid reasons on both sides of the debate.

Here are couple of my thoughts on the topic....

I come from the dot.com bubble era , and from that I rarely use stored procedures or triggers. Back then PHP was still new, Perl dominated websites with the cgi-bin and MySQL did not have stored procedures or triggers. Thank goodness things have changed. Developing in those days, forced developers to develop in certain ways. Hopefully, we all used functions and classes so we could reuse code as much as possible. Yet we all survived without stored procedures or triggers just fine by having the logic in our code. But did that make it better?
During this time I had to interact with a SQL server and the DBA that ran it. My LAMP stack used MySQL except for the one call to the finance department's SQL server for a yes/no value. ( I actually used FreeTDS for this connection and it worked like a champ, my side of things anyway. No comment on the speed or other issues of the SQL server needed here. ) I did not need or care about the table structures and layout of the finance SQL system. I only needed a yes or no answer from it. It was the finance teams system and they protected it, rightfully so. So in that case it made perfect sense, for a stored procedure. I handled the code on my side , made a call into the SQL server got the answer and got right out.

Now days of course we have stored procedures in MySQL, do we use them? Is it just an old school development versus modern development in terms of who uses them? I have seen them used at times when it just made it harder to track down a problem. Tracing code across six different functions and cursing the entire way at the missing comments in the code only to find a CALL statement. So we end up going back to the DB to find the query and hopefully do not break the site as we test changes. At the same time, I think they are great if you do need to compartmentalize your logic away from others.

It really all comes down to your application. Where do you want your logic to live? The biggest thing we need to take into account when we decide this is, what do your benchmarks say about your query and do you value the results? Do you want to take advantage of not passing the data across a network and keeping it local to the database? Are you ok with the a separate cache per connection to a stored procedure versus a globally cached query? Depending on the query and how often it is used plays a big part in that decision.

A simple example of stored procedure using the “menagerie database”  is below:

DROP PROCEDURE IF EXISTS addnewpet;
DELIMITER //
CREATE PROCEDURE addnewpet(IN petname VARCHAR(20) , ownername VARCHAR(20), speciestype VARCHAR(20), gender CHAR(1) , bday date )
BEGIN
INSERT INTO pet (name,owner,species,sex,birth) VALUES (petname,ownername,speciestype,gender,bday);
INSERT INTO event VALUES (petname, date_format(NOW(),'%Y-%m-%d'), 'first visit', NULL) ;
END //
DELIMITER ;
CALL addnewpet('Clifford','Emily','dog','m',date_format(NOW(),'%Y-%m-%d') );

This simple call will add the dog name to both tables when used.

You will notice the one of the problems with stored procedures is that you might end up having to pass a lot of variables that might already be defined in your class.

Triggers are another example where history has had developers learn to query data after insert and/or have scripts check the status of data to then make changes as needed. Triggers avoid this for you and should be used whenever possible. Let MySQL handle the adjustments for you immediately and avoid having to write code just for a simple update. Granted as situations get more complex the need might arise for function based code as well. Make sure that all code is optimized before used in the trigger as trigger options are limited.

Replication concerns with triggers are addressed here:

A quick example can be shown using the “menagerie database

mysql> SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE, ACTION_STATEMENT
FROM INFORMATION_SCHEMA.TRIGGERS
WHERE TRIGGER_SCHEMA='menagerie';
No triggers so we will create one. In this example we can assume that people forget to update the death of pets in the pet table. They note it in events but that is it.
So a simple trigger will keep this updated for us :
delimiter ;;
CREATE TRIGGER ed_update AFTER INSERT ON event
for each row
begin
UPDATE pet p
INNER JOIN event e ON p.name = e.name
SET p.death = e.date
WHERE e.remark = 'death' AND p.death IS NULL;
end;;
delimiter ;

This trigger just updated the pets table for us quickly.


To boil down my thoughts on stored procedures and triggers.... When you are considering a stored procedure make sure to benchmark and test to confirm your ok with the cache results. A stored procedure is best when the global cache is not going to be more effective overall. If tests show that the results are faster handling large data sets with stored procedures versus code base changes then also a good option to use stored procedures. Testing all options is critical.

Triggers can be a great addition to clean up additional scripting work that can be done immediately upon data insert. Again test and confirm to be sure your aware of all the options of using them. Options with triggers are limited so do all the explains ahead of time.

More links here: