Friday, June 17, 2011

The MySQL Librarian

The MySQL community team wants to ensure that our community engagement tools remain relevant, popular and above all gives you, the user, a clear strong voice. We are always here for your opinions, comments and feedback. We have started the process of examining what is useful, what could be enhanced and even what could be deprecated. 
 
We are currently reviewing your feedback as we strive to enhance and grow our MySQL community. For example, if you have not yet taken the poll on “What content do you value most on planet.mysql.com” please do. Keep a lookout for exciting news and changes as we move forward with http://planet.mysql.com improvements as well as enhancements with news and events for our community. 
 
In the meantime, it would appear that one of our features has become redundant. The MySQL Librarian was originally created as a collection of the best links found on the web when it comes to MySQL. However, the same content is usually found easily, first and better presented on the community sites http://planet.mysql.com and http://forums.mysql.com . Traffic to the librarian has been on the steady decline as the community also uses the power of search engines to find the same information as well. 
 
Since the MySQL Librarian is only a collection of links, the code and other artifacts will all still be available for our community in their respected locations. However, the function of http://dev.mysql.com/librarian/ we only be available until the end of the month. 

As always, many thanks to the great MySQL community members. 


Thursday, June 16, 2011

ATON || NOTA ? Ip address IPV4 ? IPV6 ?


IP address storage is something that some do right and others have done wrong for years. How many times have you seen or created a MySQL table having something very similar to the following.

CREATE TABLE `example_table` (
  `user_id` int(9)  NOT NULL AUTO_INCREMENT,
  `date_recorded` datetime DEFAULT NULL,
  `ip` varchar(15) DEFAULT NULL,
  PRIMARY KEY `user_id` (`user_id`),
  KEY `date_recorded` (`date_recorded`)
) ENGINE=InnoDB ;

Hopefully you never have and never run across it. What an optimized world it would be if that was the case.  It appears harmless, after all what could it hurt ? You need to store your ip address and it has those pesky periods in it, so why not varchar(15)?  

Human readable ips are just that, just for us. The database has no need to keep them in human readable form. So your field that requires a min 15 characters, to be able to store all of the variations of the ip, and another byte to store the length of the string, now requires ~16 bytes. Your wasting space by storing IP data in varchar fields, it is an integer , store it as an unsigned integer.  They are integers treat them that way.   

CREATE TABLE `table_example` (
  `user_id` int(9)  NOT NULL AUTO_INCREMENT,
  `date_recorded` datetime DEFAULT NULL,
  `ip_address`   INT UNSIGNED NOT NULL
,
  PRIMARY KEY `user_id` (`user_id`),
  KEY `date_recorded` (`date_recorded`)
) ENGINE=InnoDB ;



Now just saved you ~6 bytes per row which will add up over time.   The ip address just needs to be inserted with the function INET_ATON. 
example below:
 INSERT INTO table_example VALUES (NULL ,NOW() , INET_ATON('213.136.52.29') ) ;

Pulled back out to test :
select ip_address ,INET_NTOA(ip_address) from table_example  ;
+------------+-----------------------+
| ip_address | INET_NTOA(ip_address) |
+------------+-----------------------+
| 3582473245 | 213.136.52.29 |
+------------+-----------------------+
As you can see the example is simple. As long as you enter and pull information from your tables wrapped accordingly you will save space, and gain speed.  But is it worth it just for 6 to 10 bytes? Yes and the reason is hopefully your database will scale and grow. Those wasted bytes will add up. If you wanted an Index on the ip address field then that will also now be smaller and also faster.

http://dev.mysql.com/doc/refman/5.5/en/miscellaneous-functions.html#function_inet-aton
http://dev.mysql.com/doc/refman/5.5/en/miscellaneous-functions.html#function_inet-ntoa

What about IPV6 though ? Ah those tricky ips...

Well this appears to be addressed in 5.6.3 
This example taken right from the 5.6 documentation:
 
SELECT INET6_NTOA(INET6_ATON('fdfe::5a55:caff:fefa:9089'));
        -> 'fdfe::5a55:caff:fefa:9089'

mysql> SELECT INET6_NTOA(INET6_ATON('10.0.5.9'));
        -> '10.0.5.9'

mysql> SELECT INET6_NTOA(UNHEX('FDFE0000000000005A55CAFFFEFA9089'));
        -> 'fdfe::5a55:caff:fefa:9089'
mysql> SELECT INET6_NTOA(UNHEX('0A000509'));
        -> '10.0.5.9'
 

Friday, June 10, 2011

SELF Shirts

The SELF Linux show ramps up to full swing tomorrow. We will have the new MySQL 5.5 shirts as well as Linux shirts. Please stop by say hi and get a shirt while they last!

I was glad to see the support for MySQL on day one, can't wait until tomorrow.

Monday, June 6, 2011

Monitoring just a blink away...


I bring this up because of recent conversations I have found myself in.

Over the years I have gone back and forth between OSX & Linux laptops. Over these same years I was also monitoring several MySQL databases. I have found numerous server side solutions for you to be able to monitor your database, Nagios and of course MySQL Enterprise Monitor to list a couple. Both of these are great solutions and allow you to try and get some sleep and night.

Years ago, while using OS X, I was enlightened to learn about a free module called Geektool, it is not new, but I often find people are unaware of it.  I have had at least three conversations about it lately. While Nagios and enterprise can send me emails and sms messages, even some that I don't want. Geektool allowed me to have a pulse of whatever I wanted to monitor easily as I was busy working on other things.

Here is how it works, Geek Tool is a system preferences module for Mac OSX. It has the option to display url images and/or output from a script on top of your desktop background. So you can always have the results at a blink of your eye. I used the "shell mode" option and created Perl scripts that live locally in my LAMP database server environment. When these scripts are executed they output a range of information from a count of rows in the processlist , seconds behind master, InnoDB, stats as well as time stamps on the last run of key reports. I had key entities for the company db that I wanted available at a moments notice and displayed on top of my background and yet out of the way from desktop work. You could even tail your MySQL error logs if you wanted. Someone could ask a question and with a blink I could look to the corner of my desktop and have an answer. All of the data is dependent on the refresh thresholds that you determine. So you can have some checks every 10 seconds while another item checked every hour. It is all up to you.

Yes it would be just as easy to run the same Perl script via a terminal when asked a question. Having the results up in the corner of the screen though, I was able to see things becoming issues before they became issues. While I was off writing PHP or Perl scripts, I was also able to watch the database processlist slowly rise or slave start to fall behind and I knew someone was doing something that I needed to check on.

Was I able to catch everything and sleep well at night... No. This is just an additional monitor to keep we aware of more easily as I worked. I had Nagios, MySQL Enterprise Monitor and custom phone call alert system to handle the overnight monitoring.

The Database is of course critical, anyway to have stats and references easily, has always helped. You can use Geektool for numerous things. I happen to have used it to keep quick tabs on key stats for seven different databases, all with one look.

I have not yet found a Linux desktop replacement that I like as well yet, I am searching though.

A screenshot from their site can be seen here: http://projects.tynsoe.org/en/geektool/shots/shot02.jpg They have many more on their site.

URL here if your curious for more: http://projects.tynsoe.org/en/geektool/

Friday, June 3, 2011

Smart Selects for replicated systems


Over the last few years I have worked with a few different companies and their replication databases. Your replication choice is yours and what your needs are. I have enjoyed relayed replication and the reason being is that it gives you the chance to make your applications even smarter. 

One reason is that if you did loose a master your slave can become the new master and you still have replication active as you rebuild the down server.

Another reason is the options it can give you for selects. I have seen some systems that have a replicated slave that is never touched. Granted safe, but money spent on hardware and users complaining about application speed. Whichever scripting language you prefer, you can use all of your replicated databases to your advantage. To be safe all writes should to go the master unless you can control and confirm that nothing from the master would try to write to the same location. Selects however can be expanded.

Often users complaints are speed. Everything must be faster. If your database handle only connects to 1 database then your are limited to the performance from that once database. Java has connection pooling (http://java.sun.com/developer/onlineTraining/Programming/JDCBook/conpool.html) and PHP can be used to do the same task. Personally, although I bias, I feel you will have more control with PHP.

The concept is simple, of course you do not want to hurt performance by unnecessary overhead but these steps are and should be very fast. 
  1. connect to the slave
  2. execute SHOW SLAVE STATUS
  3. check 'Seconds_Behind_Master'
  4. Based on threshold you decide you choose which database your going to read from
    1. You can pass this to your connection handle so you have thresholds per query if needed.
    2. You can also create smart connections to never connect to slave if query does not start with select, options are up to you.

This allows you to keep slower data intense queries off the master server. Your options are up to you and what you want to do. How you choose your thresholds depends on your application. How you define what can select from master versus slave only is dependent on your application as well. This is not a replacement for optimization but just expanding your options to get the data your after.

We used three databases in a relayed replication InnoDB environment. We where able to check the third database first , based on thresholds move up to the second database if needed, and lastly run a select on the master if required. Typically the seconds behind master always 0 or under 5 seconds with this TB+ db. If something major was going on then the third database would be further back and the selects would roll to the second database naturally.

This worked very well for our application at the time, and I have used the concept with more than one company. Everything is of course dependent on your application but getting valid results fast to users is a common issue we all face.