{"id":424,"date":"2019-12-09T01:39:00","date_gmt":"2019-12-09T01:39:00","guid":{"rendered":"https:\/\/khanhcode.com\/?p=424"},"modified":"2020-12-20T02:23:38","modified_gmt":"2020-12-20T02:23:38","slug":"svm-food-101-dataset-classification","status":"publish","type":"post","link":"https:\/\/khanhcode.com\/?p=424","title":{"rendered":"SVM Food-101 Dataset Classification"},"content":{"rendered":"\n<p><strong>Project URL:<\/strong> <a href=\"https:\/\/github.com\/Insignite\/SVM-Food101-Classification\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/Insignite\/SVM-Food101-Classification<\/a><\/p>\n\n\n\n<p>This project is my part taken from the main project GitHub\u00a0food_classification. I&#8217;ve done two deep learning algorithms,\u00a0SSD Inception v2 for Card 9-A Object Detection\u00a0and\u00a0AlexNet architecture for DogvsCat Classification, so I would like to dive deeper into the Machine learning field by working on an algorithm even earlier than AlexNet. Support Vector Machines (SVM) for multiclass classification seems fun so I decided to go with it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Introduction<\/strong><\/h2>\n\n\n\n<p><strong>Support Vector Machines (SVM)<\/strong>\u00a0is a supervised learning model with associated algorithms that analyzes data by plotting data points on N-dimensionals graph (N is the number of features) and performs classification by drawing an optimal hyperplane. Data points that closer to the hyperplane influence the position and the orientation of the hyperplane. With this information, we can optimize the hyperplane by fine tuning\u00a0<strong>Cost (C)<\/strong>\u00a0and\u00a0<strong>Gradient (g = gamma substitute variable)<\/strong>. Large\u00a0<strong>C<\/strong>decreases the margin of the hyperplane, which allow much less misclassified points and lead to hyperplane attemp to fit as many point as possible, where as small\u00a0<strong>C<\/strong>\u00a0allows more generalization and smoother hyperplane. For\u00a0<strong>g<\/strong>, a higher value leads to a lower Ecludien distance between data points and scale down fit area.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Dataset<\/strong><\/h2>\n\n\n\n<p><a href=\"http:\/\/data.vision.ee.ethz.ch\/cvl\/food-101.tar.gz\">Food-101<\/a>\u00a0is a large dataset consist of 1000 images for 101 type of food. Each images have a range of dimension from 318&#215;318 to 512&#215;512.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/data_sample.png?ssl=1\" alt=\"\"\/><figcaption><br><\/figcaption><\/figure>\n\n\n\n<p>For linux user, extract the download dataset. For windows user, just use compress file extractor like WinRAR.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>tar xzvf food-101.tar.gz<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Dataset Structure<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>food-101\n  |_ images\n      |_ ***CLASSES FOLDER***\n          |_ ***IMAGE BELONG TO THE PARENT CLASSES***\n  |_ meta\n      |_ classes.txt\n      |_train.json\n      |_ train.txt\n      |_ test.json\n      |_ test.txt\n      |_ labels.txt\n  |_ license_agreement.txt\n  |_ README.txt<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Dataset Classes<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>apple_pie\t    eggs_benedict\t     onion_rings\nbaby_back_ribs\t    escargots\t\t     oysters\nbaklava\t\t    falafel\t\t     pad_thai\nbeef_carpaccio\t    filet_mignon\t     paella\nbeef_tartare\t    fish_and_chips\t     pancakes\nbeet_salad\t    foie_gras\t\t     panna_cotta\nbeignets\t    french_fries\t     peking_duck\nbibimbap\t    french_onion_soup\t     pho\nbread_pudding\t    french_toast\t     pizza\nbreakfast_burrito   fried_calamari\t     pork_chop\nbruschetta\t    fried_rice\t\t     poutine\ncaesar_salad\t    frozen_yogurt\t     prime_rib\ncannoli\t\t    garlic_bread\t     pulled_pork_sandwich\ncaprese_salad\t    gnocchi\t\t     ramen\ncarrot_cake\t    greek_salad\t\t     ravioli\nceviche\t\t    grilled_cheese_sandwich  red_velvet_cake\ncheesecake\t    grilled_salmon\t     risotto\ncheese_plate\t    guacamole\t\t     samosa\nchicken_curry\t    gyoza\t\t     sashimi\nchicken_quesadilla  hamburger\t\t     scallops\nchicken_wings\t    hot_and_sour_soup\t     seaweed_salad\nchocolate_cake\t    hot_dog\t\t     shrimp_and_grits\nchocolate_mousse    huevos_rancheros\t     spaghetti_bolognese\nchurros\t\t    hummus\t\t     spaghetti_carbonara\nclam_chowder\t    ice_cream\t\t     spring_rolls\nclub_sandwich\t    lasagna\t\t     steak\ncrab_cakes\t    lobster_bisque\t     strawberry_shortcake\ncreme_brulee\t    lobster_roll_sandwich    sushi\ncroque_madame\t    macaroni_and_cheese      tacos\ncup_cakes\t    macarons\t\t     takoyaki\ndeviled_eggs\t    miso_soup\t\t     tiramisu\ndonuts\t\t    mussels\t\t     tuna_tartare\ndumplings\t    nachos\t\t     waffles\nedamame\t\t    omelette\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Dataset Approach<\/strong><a href=\"https:\/\/github.com\/Insignite\/SVM-Food101-Classification#dataset-approach\"><\/a><\/h3>\n\n\n\n<p>In this project, I will only do classification for noodle classes as I have limited resource for training and testing. There are 5 noodle classes total:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;'pad_thai', 'pho', 'ramen', 'spaghetti_bolognese', 'spaghetti_carbonara']\n<\/code><\/pre>\n\n\n\n<p>With 5 classes, I have 5000 images total.&nbsp;<code>train.json<\/code>&nbsp;and&nbsp;<code>test.json<\/code>&nbsp;splitted into 3750 and 1250 respectively.<\/p>\n\n\n\n<p>Let&#8217;s load in the data through&nbsp;<code>train.json<\/code>. But first let&#8217;s look at how the data labeled.<\/p>\n\n\n\n<p><strong>(Below is a very small scale of train.json content for ONLY 5 classes I am targeting. Original train.json will have all 101 classes)<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\n    \"pad_thai\": &#91;\"pad_thai\/2735021\", \"pad_thai\/3059603\", \"pad_thai\/3089593\", \"pad_thai\/3175157\", \"pad_thai\/3183627\"],\n    \"ramen\": &#91;\"ramen\/2487409\", \"ramen\/3003899\", \"ramen\/3288667\", \"ramen\/3570678\", \"ramen\/3658881\"],\n    \"spaghetti_bolognese\": &#91;\"spaghetti_bolognese\/2944432\", \"spaghetti_bolognese\/2969047\", \"spaghetti_bolognese\/3087717\", \"spaghetti_bolognese\/3153075\", \"spaghetti_bolognese\/3659120\"],\n    \"spaghetti_carbonara\": &#91;\"spaghetti_carbonara\/2610045\", \"spaghetti_carbonara\/2626986\", \"spaghetti_carbonara\/3149149\", \"spaghetti_carbonara\/3516580\", \"spaghetti_carbonara\/3833174\"],\n    \"pho\": &#91;\"pho\/2599236\", \"pho\/2647478\", \"pho\/2654197\", \"pho\/2696250\", \"pho\/2715359\"]\n}\n<\/code><\/pre>\n\n\n\n<p>SVM parameters required a label list and feature list. So I will load data from\u00a0<code><strong>train.json<\/strong><\/code>\u00a0into a data frame and create a feature list for both HOG and Transfer learning.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Train Dataframe\n         filename  label\n0     1004763.jpg      0\n1     1009595.jpg      0\n2     1011059.jpg      0\n3     1011238.jpg      0\n4     1013966.jpg      0\n...           ...    ...\n3745   977656.jpg      4\n3746   980577.jpg      4\n3747   981334.jpg      4\n3748   991708.jpg      4\n3749   992617.jpg      4\n\n&#91;3750 rows x 2 columns]\n\nHOG Train Feature Shape with PCA\n(3750, 1942)\nTransfer Learning Train Feature Shape\n(3750, 6400)\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/github.com\/Insignite\/SVM-Food101-Classification#training\"><\/a><strong>Training<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Training Approach<\/strong><\/h3>\n\n\n\n<p>I built a SVM classification with two approach:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Histogram of Oriented Gradients (HOG)<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/hog.PNG?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>By using HOG, it shows that HOG image able to keep the shape of objects very well which allow for edge detection. The input images will get reshape to 227x227x3 (A higher amount of pixel input makes training much slower yet increase the accuracy). I also applied Principal Component Analysis (PCA). It is a method used to reduce the number of features (aka reduction in dimensions) in the data by extracting the important data points while retaining as much information as possible.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Transfer Learning<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/AlexNet.png?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>The transfer learning technique is a method that uses the pre-trained model to build a new custom model or perform feature extraction. In this project, I will use a pre-trained\u00a0AlexNet\u00a0model from my teammate for feature extraction. AlexNet input is always 227x227x3 so I will reshape all images to this dimension. I built a new model with all layers of my teammate AlexNet until\u00a0flatten layer(Displayed in the figure), which gives an output of 5x5x256 = 6400 training features.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Training parameters<\/strong><\/h3>\n\n\n\n<p>SVM has three important parameters we should wary about: Kernel type, C, and g (C and g explanation in\u00a0Introduction\u00a0section). Kernel type very much depends if the data points are linearly separable. Let&#8217;s plot 151 images with their first 2 features out of 6400 features into the different kernel of SVM. All three plot will have C = 0.5 and g = 2.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/kernel.png?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>It seems like the data points able to classify decently well with all three kernels, but this is only the first 2 features. What if we plot all 6400 features? There will definitely an kernel out perform others.There are still C and g that I can adjust to optimize the hyperplane. Let&#8217;s take a look of various C and g plot.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/gamma_sample.PNG?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/c_sample.PNG?ssl=1\" alt=\"\"\/><figcaption>source: <a href=\"https:\/\/medium.com\/all-things-ai\/in-depth-parameter-tuning-for-svc-758215394769\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/medium.com\/all-things-ai\/in-depth-parameter-tuning-for-svc-758215394769<\/a><\/figcaption><\/figure>\n\n\n\n<p>With so many way C and g can tune the hyperplane, how can we find the optimal combination? Let&#8217;s do something called Grid searching, essentially is running cross validation for all possible combination of Kernel, C, and g on certain range. According to\u00a0<a href=\"https:\/\/www.csie.ntu.edu.tw\/~cjlin\/papers\/guide\/guide.pdf\">A Practical Guide to Support Vector Classification<\/a>\u00a0paper, exponential growing of C and g give the best result. I will use the paper recommended range C = 2<sup>-5<\/sup>, 2<sup>-3<\/sup>, &#8230; ,2<sup>15<\/sup>\u00a0and g = 2<sup>-15<\/sup>, 2<sup>-13<\/sup>, &#8230; , 2<sup>3<\/sup>. With all three parameters, I able to create 396 combinations. Below if a sample of small combination runs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/grid_search.PNG?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>After 396 cross validations run with different parameters, the parameter with highest accuracy is Kernel = Linear, C = 0.5, and g = 2. Now we are ready to train our model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Training Model<\/strong><\/h3>\n\n\n\n<p>I initially use Scikit-Learn to train an SVM model, but it takes extremely long for an unknown reason. To this day I still don&#8217;t know why. Stumble upon a suggestion, I switched over to\u00a0<a href=\"https:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\/\">LIBSVM<\/a>\u00a0and able to increase training time significantly.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Result<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Histogram of Oriented Gradients (HOG)<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>- Traing Validation Accuracy: 81.0%\n- Test Accuracy: 96.0%<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Transfer Learning<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>- Cross Validation Accuracy: 57%\n- Test Accuracy: 68.2%<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/raw.githubusercontent.com\/Insignite\/SVM-Food101-Classification\/master\/img\/result.PNG?ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p>HOG approach has much higher accuracy compared to the Transfer learning approach. This is within my expectation because Transfer learning on the AlexNet model required input images to go through a series of filters, which lead to loss of detail and reduction in features. My prediction is that if the Transfer learning approach taking earlier layers, rather than taken up to the last Convolutional layer of AlexNet, the accuracy would be better because layers toward the beginning of AlexNet architecture given much more features than later layers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Project URL: https:\/\/github.com\/Insignite\/SVM-Food101-Classification This project is my part taken from the main project GitHub\u00a0food_classification. I&#8217;ve done two deep learning algorithms,\u00a0SSD Inception v2 for Card 9-A Object Detection\u00a0and\u00a0AlexNet architecture for DogvsCat Classification, so I would like to dive deeper into the Machine learning field by working on an algorithm even earlier than AlexNet. Support Vector Machines [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":425,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-424","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/khanhcode.com\/wp-content\/uploads\/2020\/12\/svm_sample.png?fit=921%2C473&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/posts\/424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/khanhcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=424"}],"version-history":[{"count":1,"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/posts\/424\/revisions"}],"predecessor-version":[{"id":426,"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/posts\/424\/revisions\/426"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/khanhcode.com\/index.php?rest_route=\/wp\/v2\/media\/425"}],"wp:attachment":[{"href":"https:\/\/khanhcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/khanhcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/khanhcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}