MySQL Partitioning

MySQL Partitioning


Music Credit: YouTube Audio Library The MySQL Server Process (mysqld) is currently not running Let’s set the OS PATH variable to pick up commands from the MySQL Installation directory We’ll use this ‘my.cnf’ while starting the MySQL Server Process The MySQL Server Process (mysqld) is now started On another Terminal we have the ‘client’ host that we’ll use to connect to the MySQL Server This ‘.my.cnf’ under User’s home directory will be used by all MySQL Client commands (mysql, mysqladmin etc.) We are inside the MySQL Server instance We are running MySQL Server Community Edition version 5.7 In this episode, we’ll play around with the Tables in the ‘world’ Database The PARTITIOING capability in MySQL is a plugin in MySQL. We’ve listed all MySQL Plugins here We’ll use the ‘City2’ Table for our demonstrations around MySQL Partitions Watch out for the ID column of ‘City2’ Table. That’s what we’ll use to PARTITION the Table There are over 4000 rows in the ‘City2’ Table Let’s first figure out if the ‘City2’ Table is already partitioned or not When we look at the files that corresponds to the ‘City2’ Table, it doesn’t give us an indication of a Table partition. As always, we can see two files, one for the Table format (City2.frm) and one for data & index (City2.ibd) Observe the ‘create_options’ section of the output. It’s empty, and doesn’t give us any hint on Table partition We just selected a row from the ‘City2’ Table with a WHERE clause (ID=1) to filter down the result to just one entry. And of course we got the desired result To know what happens behind the hood of an operation like SELECT, there is a useful command called ‘EXPLAIN’. Let’s run the SELECT command again with ‘EXPLAIN’ asking ‘mysql’ client to explain the query plan This informs us whether the SELECT operation was simple or complex (like join operations), whether an index was used to fetch the result, the length of the key (index) used etc. Importantly, we see well over 4000 rows were scanned to get that one row. So it was a full table scan! For a small table, it’s Ok,but not certainly for Tables with millions of entries One way to improve the performance is to partition the Table. And now that we know the search operation is mostly based on the column ‘ID’, we’ll partition the Table using the column ID. The partition we’ll use here is called RANGE Partition (such as Range of numbers) The Table ‘City2’ is now going to be partitioned horizontally. We’ll call the first partition ‘p0’ and put all rows which has the ‘ID’ column value less than 2000 Likewise, a second partition ‘p1’ where we’ll put all rows in the ‘City2’ Table which has values less than 4000 in the ‘ID’ column Well, a third a final partition ‘p2’ for the ‘City2’ Table, where rest of the entries (rows with IDs greater than 4000 and less than the Maximum value) fall The ‘City2’ Table is now partitioned The ‘SHOW TABLE STATUS’ command for ‘City2’ Table confirms that the Table is partitioned Let’s see the changes at the OS filesystem level Wow, now the Data & Index file of ‘City2’ Table (City2.ibd) is split into several files You’d remember about a Virtual Database in MySQL instance who has all Metadata. Let’s go in there (hint: information_schema) Well, ‘information_schema’ has a Table called ‘PARTITIONS’ which gives us information about all Table Partitions that exist in a MySQL instance There we have it about the ‘City2’ Table Even the ‘SHOW CREATE TABLE’ can also give details about a Table Partition Now on to the most important observation. Whether we achieved any benefits from partitioning the ‘City2’ Table. Let’s query the Table for an entry with the column ‘ID’ having a value 1 The result is same as what we got before partitioning the Table. But to know what happened behind the hood, let’s use the ‘EXPLAIN’ command on SELECT operation Well, in the earlier situation, the SELECT operation did a full Table scan (4000 odd rows), not in this one. The scan was confined to a partition ‘p0’ with only ~2000 entries. So there is indeed performance benefit Let’s change the WHERE clause ‘ID’ value to greater than 2000 and see if the query is hitting the right partition As you can see, now the search is confined to only one partition ‘p1’ with 2000 entries Let’s remove the Table partitions The Table ‘City2’ is no longer a partitioned Table The Data & Index file for the ‘City2’ Table is just a single file now (City.ibd) We are looking at another Table named ‘Stores’ to study a second type of Table partitioning called ‘List’ Partitions Observe that the City ‘Kochi’ has some Stores with IDs (store_id) 1, 5 and 9 Similarly other cities like Mumbai, Bangalore and Hyderabad also have some stores with IDs that do not fall in a sequence The implication of this is that, if we were to partition this Table based on the Cities, we cannot obviously use RANGE partition Just making sure that the table ‘Stores’ is not currently partitioned Another way to figure out if the Table ‘Stores’ is partitioned or not At the OS filesystem level, the Table ‘Stores’ has only two files: one for Table format (Stores.frm) and another for data & index (Stores.ibd) The ‘Stores’ Table has only two columns: (i) store_id an Integer, (ii) location a varchar datatype Let’s select the set of rows from the ‘Stores’ Table that matches the location ‘Kochi’ It returned three rows That’s fine, but we are more interested to know what query plan did MySQL use to get us those three rows. Let’s use the ‘EXPLAIN’ command again on this ‘SELECT’ operation Very clearly, it was a full Table scan (look at the number of rows it searched on the ‘Stores’ Table to get us those three rows that matched the City ‘Kochi’) The same output in a different format That’s the Table we are going to partition and this time around, we’ll create partitions based on the Cities in the Table (‘location’ column) So this time it’s not ‘RANGE”, but ‘LIST’ partitioning by the column, ‘location’ Create a Table partition ‘loc1’ having only those rows in the ‘Stores’ Table which has value for the ‘location’ column set to ‘Kochi’ Similarly, a second partition ‘loc2’ for the Table ‘Stores’ that has all rows with Column ‘location’ having the value ‘Bangalore’ Third partition ‘loc3’ for the Table ‘Stores’ that has all rows with Column ‘location’ having the value ‘Hyderabad’ And a final partition (loc4) that will contain all rows with ‘Mumbai’ as the value for the ‘location’ column in the ‘Stores’ Table So the ‘SHOW CREATE TABLE’ command now confirms that the Table ‘Stores’ is partitioned by LIST There are more data files for the Stores Table now (one for each partition) Straightaway, let’s run the EXPLAIN command on the SELECT statement to see if the query with a specific City as the value for the ‘location’ column is doing a Full Table Scan or only a select partition Well, looks like it searched only three rows in the ‘Stores’ Table and specifically the partition by name ‘loc1’ It’s going to be no different for the other ‘location’ as well. Just that the Table partition that the query lands is going to be a different one As before the ‘SHOW TABLE STATUS’ command is also showing that the ‘Stores’ Table is partitioned (‘Create_options’ field) We’ll have a quick look at one more type of Table Partitioning in MySQL and for that we’ll use the same old ‘City2’ Table It’s called HASH Partitioning and probably the simplest. We just tell MySQL to create a certain number of Partitions for a Table. Behind the hood, it uses modular operation to arrange rows in different Table Partitions. Read more about it in MySQL Documentation So we have again partitioned the ‘City2’ Table into 4 partitions. We can run already familiar commands to see if the Table is partitioned and whether a query is hitting a specific partition and not search the whole Table As you can see, the query is hitting a specific partition (p1) and only select number of rows (1020) and not the whole Table is scanned Thanks for watching!

8 Comments

  • Ashwani Rai says:

    Explaining well about partitions ….great thanks

  • Nitin kumar says:

    easy, thanks.

  • Oracle Dev says:

    thanks man

  • Luiz Gustavo says:

    Thank you very much, that's what I was looking for.

  • Kiên Định says:

    Thank you to my Indian hero, the video is very great it helped me a lot.I hope you continue this partitioning topic

  • Python says:

    If you might can you help me to modify sql command below,this if it is not really make sure you better than
    I use the table to save the website information I collected from the internet with jsoup

    CREATE TABLE `sf_resources` (
    `resources_id` bigint(20) NOT NULL AUTO_INCREMENT,
    `resources_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `resources_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
    `website_id` bigint(20) NOT NULL,
    `status` tinyint(4) NOT NULL DEFAULT '0',
    `resources_identifier` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
    PRIMARY KEY (`resources_id`,`status`,`resources_identifier`),
    UNIQUE KEY `resources_ide` (`resources_identifier`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

    ALTER TABLE `sf_resources` PARTITION BY LIST COLUMNS (`status`)(
    PARTITION status_ready VALUES IN (0),
    PARTITION status_succeed VALUES IN (1)
    )

    Error Code: 1503
    A UNIQUE INDEX must include all columns in the table's partitioning function

  • Keyur Patel says:

    Nice explanation… i want to know in LIST partition here we have just 4 to 5 city but if we have more than 1000 than is any another way to write this code?

  • reshma sontireddy says:

    what a great bgm 🙂 where did you find?

Leave a Reply

Your email address will not be published. Required fields are marked *