Tuesday, May 31, 2011

If your data lacks integrity you are, in a word, doomed.

Recently I ran across across a website and their related blog ( http://channelmeter.com/ & http://insidechannelmeter.wordpress.com/ ) . The company's focus is the analytical data of You Tube. Naturally with the size of You Tube the analytical data is going to be big.  So I knew this would not be a company that is just taking a few hundred names from a web form each day. I saw from  their blog posting that they use MySQL community edition 5.5. I reached out to asked them a few questions. “We are storing tens of millions of rows of data, and adding millions more every day.” – Dave Storrs Co-Founder, ChannelMeter.com. To me this is showing, once again, how well a company can be built and run with an open source database and tools.
They rely on the default InnoDB storage engine in MySQL because of the ACID compliance and data integrity it offered them. This being supported by the use of transactions and foreign keys. “If your data lacks integrity you are, in a word, doomed.”– Dave Storrs. The tradeoff, if any, for this stability was speed. They can still address some speed issues with faster machines, indexes and additional use of caching. They currently have plans to utilize a 10 minute caching timeout and deploy Varnish. 
As with any start up, they have run across a few issues. The rapid growth of their data has required them to evaluate their hard drives and options of partitioning or sharding their data. They have used the MySQL community to help them with any problems. “Whenever I encounter difficulties related to the DB, the answer is almost always to be found in the documentation or the forums.” – Dave Storrs. 
While I might be biased, I am still shocked that is it not standard to run a web server and company on a LAMP stack environment. 

More information is available via their blog: http://insidechannelmeter.wordpress.com/2011/05/25/advice-to-developers-on-managing-mysql/

So the long and short of it...... Way to go open source !

Wednesday, May 18, 2011

Let the communities voice be heard....

MySQL is going to revitalize planet.mysql.com and we want to hear from you! This is a community platform and you should have a say. What would you like to see on planet.mysql.com ? Please feel free to take the poll  titled "What content do you value most on planet.mysql.com?" or respond with ideas if you do not see it listed.

Friday, May 13, 2011

Taking a look at MySQL Cluster -- and saving money in the long run

Over the last few years I have worked for few different companies that utilized versions of relayed replication with the MySQL databases. Their data handler connections where done with Java connection pooling or custom built PHP/PERL code. This allowed each company to spread the load of heavy queries and maintain a decent up time based on the load of their servers. One of the companies comes to mind often when I think of the MySQL cluster.

The company was growing and we made plans to move out of leased servers and build out our racks in a co-location center. This would have been a great time to think ahead and move towards or at the very least heavily evaluate a MySQL Cluster solution. I often got push back on this topic. Granted hindsight is always 20/20. Everything was new and fresh and we could have tested and benchmarked before live traffic was on the systems. Instead we continued with a basic MySQL replication set up (1 master --> 2 slaves). It did not take us long before we adjusted the databases to a relayed replication environment (1 master --> slave --> slave) . As we advanced with this environment we built out custom PHP handles that could take thresholds per query and allowed us to pick and choose which database the query would run on based on server load and slave lag. This allowed us to keep heavy queries off the master easily. While the relayed replication setup improved our stability and reduced our downtime drastically, downtime was still a concern and issue at times.

Looking back I think the push back against a MySQL cluster was more related to costs of the additional servers versus the primarily three servers we ran on. Was that really the best choice? If we would have invested ~20k to ~40k more in additional beefy database servers ( ~20k per box at the time) we potentially could have saved development time & costs,and had a stronger and more stable database. Saving these two items alone could have helped us advance the platform more, since we had more development cycles and been more profitable due to almost no downtime. MySQL cluster is focus is 99.999% availability. I am willing to bet that the savings gained from going to a cluster then would have been more than the ~20k to 40k we saved.

To make looking back even harder, the minimum high availability MySQL cluster set up is three servers, exactly what we had anyway. I would have liked to make it more robust and have at least one other server but even that was not required. If we would have taken a step back, and evaluated our options and not based our opinion on assumptions, I feel we would have avoided the issues we ended up developing around.

I had a great time pushing with all of the development we did at the time. But was it best for the company?

Does that mean MySQL cluster is the answer for everything, no. Does it mean it is well worth taking a break and stepping back to truly evaluate all of your options, big or small company, yes.

Numerous examples of setting up a cluster are available online, it is worth a look.

Monday, May 9, 2011

If you haven't used the MySQL Sandbox... No need to wait...

The MySQL Sandbox tool is a fantastic way to evaluate and install different versions of MySQL. Often when new versions of software come out people tend to be very nervous to upgrade and test with their current stable server environments. You just might not be able to take a risk and upgrade a development or slave servers to test evaluate a new version.
 Giuseppe Maxia “The Data Charmer” has built the MySQL Sandbox (http://mysqlsandbox.net/) to help you avoid these risks. This is just one of the many things Giuseppe does for the MySQL community.

Code is available via Launchpad or CPAN.

Documentation can be found here: http://mysqlsandbox.net/docs.html. In my opinion CPAN contains most of the documentation you would need: http://search.cpan.org/~gmax/MySQL-Sandbox-3.0.12/lib/MySQL/Sandbox.pm

Once installed you can then easily run different versions of MySQL to test queries and review options , this does include replication as well.

Here is a brief overview of an install. 
   cd ~/Downloads/
    tar -vxzf MySQL-Sandbox-3.0.12.tar.gz
    cd MySQL-Sandbox-3.0.12
    perl Makefile.PL
    make install
    mkdir ~/sandboxes
    cd ~/Downloads/

A quick download of the code from labs.mysql.com for example followed by this command :

   make_sandbox mysql-5.6.2-labs-innodb-memcached-linux2.6-x86_64.tar.gz

    “Your sandbox server was installed in $HOME/sandboxes/msb_5_6_2”

So I now have 2 versions of MySQL installed and ready for review.

   mysql -p 
         Server version: 5.5.11-log MySQL Community Server (GPL)

   cd $HOME/sandboxes/msb_5_6_2
    Server version: 5.6.2-labs-innodb-memcached MySQL Community Server (GPL)

For testing purposes you can then take a db or tables and use them in your sandbox. 
      mysqldump -p db_example > db_example.sql

      cd $HOME/sandboxes/msb_5_6_2
   ./use -e "create database db_example"
   ./use db_example < db_example.sql

I am now able to adjust mysql settings in 5.6.2  ($HOME/sandboxes/msb_5_6_2/my.sandbox.cnf)  and test queries on the same data.

Giuseppe has help from his team as well all of whom can be seen here: https://launchpad.net/mysql-sandbox/+topcontributors

Many thanks to Giuseppe and all of your contributors for the sandbox.