Food Wastage Analysis From 1961-2013 Fao

About the dataset

Context Our world population is expected to grow from 7.3 billion today to 9.7 billion in the year 2050. Finding solutions for feeding the growing world population has become a hot topic for food and agriculture organizations, entrepreneurs and philanthropists. These solutions range from changing the way we grow our food to changing the way we eat. To make things harder, the world's climate is changing and it is both affecting and affected by the way we grow our food – agriculture. This dataset provides an insight on our worldwide food production - focusing on a comparison between food produced for human consumption and feed produced for animals.

Content The Food and Agriculture Organization of the United Nations provides free access to food and agriculture data for over 245 countries and territories, from the year 1961 to the most recent update (depends on the dataset). One dataset from the FAO's database is the Food Balance Sheets. It presents a comprehensive picture of the pattern of a country's food supply during a specified reference period, the last time an update was loaded to the FAO database was in 2013. The food balance sheet shows for each food item the sources of supply and its utilization. This chunk of the dataset is focused on two utilizations of each food item available:

Food - refers to the total amount of the food item available as human food during the reference period. Feed - refers to the quantity of the food item available for feeding to the livestock and poultry during the reference period. Dataset's attributes:

Area code - Country name abbreviation Area - County name Item - Food item Element - Food or Feed Latitude - geographic coordinate that specifies the north–south position of a point on the Earth's surface Longitude - geographic coordinate that specifies the east-west position of a point on the Earth's surface Production per year - Amount of food item produced in 1000 tonnes

This is a simple exploratory notebook that heavily expolits pandas and seaborn

# Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# importing data
df = pd.read_csv("FAO.csv",  encoding = "ISO-8859-1")
pd.options.mode.chained_assignment = None
from sklearn.linear_model import LinearRegression
df
Area Abbreviation Area Code Area Item Code Item Element Code Element Unit latitude longitude ... Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013
0 AFG 2 Afghanistan 2511 Wheat and products 5142 Food 1000 tonnes 33.94 67.71 ... 3249.0 3486.0 3704.0 4164.0 4252.0 4538.0 4605.0 4711.0 4810 4895
1 AFG 2 Afghanistan 2805 Rice (Milled Equivalent) 5142 Food 1000 tonnes 33.94 67.71 ... 419.0 445.0 546.0 455.0 490.0 415.0 442.0 476.0 425 422
2 AFG 2 Afghanistan 2513 Barley and products 5521 Feed 1000 tonnes 33.94 67.71 ... 58.0 236.0 262.0 263.0 230.0 379.0 315.0 203.0 367 360
3 AFG 2 Afghanistan 2513 Barley and products 5142 Food 1000 tonnes 33.94 67.71 ... 185.0 43.0 44.0 48.0 62.0 55.0 60.0 72.0 78 89
4 AFG 2 Afghanistan 2514 Maize and products 5521 Feed 1000 tonnes 33.94 67.71 ... 120.0 208.0 233.0 249.0 247.0 195.0 178.0 191.0 200 200
5 AFG 2 Afghanistan 2514 Maize and products 5142 Food 1000 tonnes 33.94 67.71 ... 231.0 67.0 82.0 67.0 69.0 71.0 82.0 73.0 77 76
6 AFG 2 Afghanistan 2517 Millet and products 5142 Food 1000 tonnes 33.94 67.71 ... 15.0 21.0 11.0 19.0 21.0 18.0 14.0 14.0 14 12
7 AFG 2 Afghanistan 2520 Cereals, Other 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0 0
8 AFG 2 Afghanistan 2531 Potatoes and products 5142 Food 1000 tonnes 33.94 67.71 ... 276.0 294.0 294.0 260.0 242.0 250.0 192.0 169.0 196 230
9 AFG 2 Afghanistan 2536 Sugar cane 5521 Feed 1000 tonnes 33.94 67.71 ... 50.0 29.0 61.0 65.0 54.0 114.0 83.0 83.0 69 81
10 AFG 2 Afghanistan 2537 Sugar beet 5521 Feed 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
11 AFG 2 Afghanistan 2542 Sugar (Raw Equivalent) 5142 Food 1000 tonnes 33.94 67.71 ... 124.0 152.0 169.0 192.0 217.0 231.0 240.0 240.0 250 255
12 AFG 2 Afghanistan 2543 Sweeteners, Other 5142 Food 1000 tonnes 33.94 67.71 ... 9.0 15.0 12.0 6.0 11.0 2.0 9.0 21.0 24 16
13 AFG 2 Afghanistan 2745 Honey 5142 Food 1000 tonnes 33.94 67.71 ... 3.0 3.0 3.0 3.0 3.0 3.0 3.0 2.0 2 2
14 AFG 2 Afghanistan 2549 Pulses, Other and products 5521 Feed 1000 tonnes 33.94 67.71 ... 3.0 2.0 3.0 3.0 3.0 5.0 4.0 5.0 4 4
15 AFG 2 Afghanistan 2549 Pulses, Other and products 5142 Food 1000 tonnes 33.94 67.71 ... 17.0 35.0 37.0 40.0 54.0 80.0 66.0 81.0 63 74
16 AFG 2 Afghanistan 2551 Nuts and products 5142 Food 1000 tonnes 33.94 67.71 ... 11.0 13.0 24.0 34.0 42.0 28.0 66.0 71.0 70 44
17 AFG 2 Afghanistan 2560 Coconuts - Incl Copra 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
18 AFG 2 Afghanistan 2561 Sesame seed 5142 Food 1000 tonnes 33.94 67.71 ... 16.0 16.0 13.0 16.0 16.0 16.0 19.0 17.0 16 16
19 AFG 2 Afghanistan 2563 Olives (including preserved) 5142 Food 1000 tonnes 33.94 67.71 ... 1.0 1.0 0.0 0.0 2.0 3.0 2.0 2.0 2 2
20 AFG 2 Afghanistan 2571 Soyabean Oil 5142 Food 1000 tonnes 33.94 67.71 ... 6.0 35.0 18.0 21.0 11.0 6.0 15.0 16.0 16 16
21 AFG 2 Afghanistan 2572 Groundnut Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
22 AFG 2 Afghanistan 2573 Sunflowerseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 4.0 6.0 5.0 9.0 3.0 8.0 15.0 16.0 17 23
23 AFG 2 Afghanistan 2574 Rape and Mustard Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 1.0 3.0 5.0 6.0 6.0 1.0 2.0 2 2
24 AFG 2 Afghanistan 2575 Cottonseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 3.0 3.0 3.0 3.0 4.0 3.0 3.0 3 4
25 AFG 2 Afghanistan 2577 Palm Oil 5142 Food 1000 tonnes 33.94 67.71 ... 71.0 69.0 56.0 51.0 36.0 53.0 59.0 51.0 61 64
26 AFG 2 Afghanistan 2579 Sesameseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 1.0 1.0 1.0 2.0 2.0 1.0 1.0 2.0 1 1
27 AFG 2 Afghanistan 2580 Olive Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1 1
28 AFG 2 Afghanistan 2586 Oilcrops Oil, Other 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 1.0 0.0 0.0 3.0 1.0 2.0 2.0 2 2
29 AFG 2 Afghanistan 2601 Tomatoes and products 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 2.0 8.0 1.0 0.0 0.0 0.0 0.0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
21447 ZWE 181 Zimbabwe 2765 Crustaceans 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21448 ZWE 181 Zimbabwe 2766 Cephalopods 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21449 ZWE 181 Zimbabwe 2767 Molluscs, Other 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0 0
21450 ZWE 181 Zimbabwe 2775 Aquatic Plants 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21451 ZWE 181 Zimbabwe 2680 Infant food 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21452 ZWE 181 Zimbabwe 2905 Cereals - Excluding Beer 5521 Feed 1000 tonnes -19.02 29.15 ... 75.0 54.0 75.0 55.0 63.0 62.0 55.0 55.0 55 55
21453 ZWE 181 Zimbabwe 2905 Cereals - Excluding Beer 5142 Food 1000 tonnes -19.02 29.15 ... 1844.0 1842.0 1944.0 1962.0 1918.0 1980.0 2011.0 2094.0 2071 2016
21454 ZWE 181 Zimbabwe 2907 Starchy Roots 5142 Food 1000 tonnes -19.02 29.15 ... 223.0 236.0 238.0 228.0 245.0 258.0 258.0 269.0 272 276
21455 ZWE 181 Zimbabwe 2908 Sugar Crops 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21456 ZWE 181 Zimbabwe 2909 Sugar & Sweeteners 5142 Food 1000 tonnes -19.02 29.15 ... 335.0 313.0 339.0 302.0 285.0 287.0 314.0 336.0 396 416
21457 ZWE 181 Zimbabwe 2911 Pulses 5142 Food 1000 tonnes -19.02 29.15 ... 63.0 59.0 61.0 57.0 69.0 78.0 68.0 56.0 52 55
21458 ZWE 181 Zimbabwe 2912 Treenuts 5142 Food 1000 tonnes -19.02 29.15 ... 1.0 2.0 1.0 2.0 2.0 3.0 4.0 2.0 4 3
21459 ZWE 181 Zimbabwe 2913 Oilcrops 5521 Feed 1000 tonnes -19.02 29.15 ... 36.0 46.0 41.0 33.0 31.0 19.0 24.0 17.0 27 30
21460 ZWE 181 Zimbabwe 2913 Oilcrops 5142 Food 1000 tonnes -19.02 29.15 ... 60.0 59.0 61.0 62.0 48.0 44.0 41.0 40.0 38 38
21461 ZWE 181 Zimbabwe 2914 Vegetable Oils 5142 Food 1000 tonnes -19.02 29.15 ... 111.0 114.0 112.0 114.0 134.0 135.0 137.0 147.0 159 160
21462 ZWE 181 Zimbabwe 2918 Vegetables 5142 Food 1000 tonnes -19.02 29.15 ... 161.0 166.0 208.0 185.0 137.0 179.0 215.0 217.0 227 227
21463 ZWE 181 Zimbabwe 2919 Fruits - Excluding Wine 5142 Food 1000 tonnes -19.02 29.15 ... 191.0 134.0 167.0 177.0 185.0 184.0 211.0 230.0 246 217
21464 ZWE 181 Zimbabwe 2922 Stimulants 5142 Food 1000 tonnes -19.02 29.15 ... 7.0 21.0 14.0 24.0 16.0 11.0 23.0 11.0 10 10
21465 ZWE 181 Zimbabwe 2923 Spices 5142 Food 1000 tonnes -19.02 29.15 ... 7.0 11.0 7.0 12.0 16.0 16.0 14.0 11.0 12 12
21466 ZWE 181 Zimbabwe 2924 Alcoholic Beverages 5142 Food 1000 tonnes -19.02 29.15 ... 294.0 290.0 316.0 355.0 398.0 437.0 448.0 476.0 525 516
21467 ZWE 181 Zimbabwe 2943 Meat 5142 Food 1000 tonnes -19.02 29.15 ... 222.0 228.0 233.0 238.0 242.0 265.0 262.0 277.0 280 258
21468 ZWE 181 Zimbabwe 2945 Offals 5142 Food 1000 tonnes -19.02 29.15 ... 20.0 20.0 21.0 21.0 21.0 21.0 21.0 21.0 22 22
21469 ZWE 181 Zimbabwe 2946 Animal fats 5142 Food 1000 tonnes -19.02 29.15 ... 26.0 26.0 29.0 29.0 27.0 31.0 30.0 25.0 26 20
21470 ZWE 181 Zimbabwe 2949 Eggs 5142 Food 1000 tonnes -19.02 29.15 ... 15.0 18.0 18.0 21.0 22.0 27.0 27.0 24.0 24 25
21471 ZWE 181 Zimbabwe 2948 Milk - Excluding Butter 5521 Feed 1000 tonnes -19.02 29.15 ... 21.0 21.0 21.0 21.0 21.0 23.0 25.0 25.0 30 31
21472 ZWE 181 Zimbabwe 2948 Milk - Excluding Butter 5142 Food 1000 tonnes -19.02 29.15 ... 373.0 357.0 359.0 356.0 341.0 385.0 418.0 457.0 426 451
21473 ZWE 181 Zimbabwe 2960 Fish, Seafood 5521 Feed 1000 tonnes -19.02 29.15 ... 5.0 4.0 9.0 6.0 9.0 5.0 15.0 15.0 15 15
21474 ZWE 181 Zimbabwe 2960 Fish, Seafood 5142 Food 1000 tonnes -19.02 29.15 ... 18.0 14.0 17.0 14.0 15.0 18.0 29.0 40.0 40 40
21475 ZWE 181 Zimbabwe 2961 Aquatic Products, Other 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21476 ZWE 181 Zimbabwe 2928 Miscellaneous 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0

21477 rows Ă— 63 columns

Let's see what the data looks like...

Plot for annual produce of different countries with quantity in y-axis and years in x-axis

df
Area Abbreviation Area Code Area Item Code Item Element Code Element Unit latitude longitude ... Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013
0 AFG 2 Afghanistan 2511 Wheat and products 5142 Food 1000 tonnes 33.94 67.71 ... 3249.0 3486.0 3704.0 4164.0 4252.0 4538.0 4605.0 4711.0 4810 4895
1 AFG 2 Afghanistan 2805 Rice (Milled Equivalent) 5142 Food 1000 tonnes 33.94 67.71 ... 419.0 445.0 546.0 455.0 490.0 415.0 442.0 476.0 425 422
2 AFG 2 Afghanistan 2513 Barley and products 5521 Feed 1000 tonnes 33.94 67.71 ... 58.0 236.0 262.0 263.0 230.0 379.0 315.0 203.0 367 360
3 AFG 2 Afghanistan 2513 Barley and products 5142 Food 1000 tonnes 33.94 67.71 ... 185.0 43.0 44.0 48.0 62.0 55.0 60.0 72.0 78 89
4 AFG 2 Afghanistan 2514 Maize and products 5521 Feed 1000 tonnes 33.94 67.71 ... 120.0 208.0 233.0 249.0 247.0 195.0 178.0 191.0 200 200
5 AFG 2 Afghanistan 2514 Maize and products 5142 Food 1000 tonnes 33.94 67.71 ... 231.0 67.0 82.0 67.0 69.0 71.0 82.0 73.0 77 76
6 AFG 2 Afghanistan 2517 Millet and products 5142 Food 1000 tonnes 33.94 67.71 ... 15.0 21.0 11.0 19.0 21.0 18.0 14.0 14.0 14 12
7 AFG 2 Afghanistan 2520 Cereals, Other 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0 0
8 AFG 2 Afghanistan 2531 Potatoes and products 5142 Food 1000 tonnes 33.94 67.71 ... 276.0 294.0 294.0 260.0 242.0 250.0 192.0 169.0 196 230
9 AFG 2 Afghanistan 2536 Sugar cane 5521 Feed 1000 tonnes 33.94 67.71 ... 50.0 29.0 61.0 65.0 54.0 114.0 83.0 83.0 69 81
10 AFG 2 Afghanistan 2537 Sugar beet 5521 Feed 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
11 AFG 2 Afghanistan 2542 Sugar (Raw Equivalent) 5142 Food 1000 tonnes 33.94 67.71 ... 124.0 152.0 169.0 192.0 217.0 231.0 240.0 240.0 250 255
12 AFG 2 Afghanistan 2543 Sweeteners, Other 5142 Food 1000 tonnes 33.94 67.71 ... 9.0 15.0 12.0 6.0 11.0 2.0 9.0 21.0 24 16
13 AFG 2 Afghanistan 2745 Honey 5142 Food 1000 tonnes 33.94 67.71 ... 3.0 3.0 3.0 3.0 3.0 3.0 3.0 2.0 2 2
14 AFG 2 Afghanistan 2549 Pulses, Other and products 5521 Feed 1000 tonnes 33.94 67.71 ... 3.0 2.0 3.0 3.0 3.0 5.0 4.0 5.0 4 4
15 AFG 2 Afghanistan 2549 Pulses, Other and products 5142 Food 1000 tonnes 33.94 67.71 ... 17.0 35.0 37.0 40.0 54.0 80.0 66.0 81.0 63 74
16 AFG 2 Afghanistan 2551 Nuts and products 5142 Food 1000 tonnes 33.94 67.71 ... 11.0 13.0 24.0 34.0 42.0 28.0 66.0 71.0 70 44
17 AFG 2 Afghanistan 2560 Coconuts - Incl Copra 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
18 AFG 2 Afghanistan 2561 Sesame seed 5142 Food 1000 tonnes 33.94 67.71 ... 16.0 16.0 13.0 16.0 16.0 16.0 19.0 17.0 16 16
19 AFG 2 Afghanistan 2563 Olives (including preserved) 5142 Food 1000 tonnes 33.94 67.71 ... 1.0 1.0 0.0 0.0 2.0 3.0 2.0 2.0 2 2
20 AFG 2 Afghanistan 2571 Soyabean Oil 5142 Food 1000 tonnes 33.94 67.71 ... 6.0 35.0 18.0 21.0 11.0 6.0 15.0 16.0 16 16
21 AFG 2 Afghanistan 2572 Groundnut Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
22 AFG 2 Afghanistan 2573 Sunflowerseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 4.0 6.0 5.0 9.0 3.0 8.0 15.0 16.0 17 23
23 AFG 2 Afghanistan 2574 Rape and Mustard Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 1.0 3.0 5.0 6.0 6.0 1.0 2.0 2 2
24 AFG 2 Afghanistan 2575 Cottonseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 3.0 3.0 3.0 3.0 4.0 3.0 3.0 3 4
25 AFG 2 Afghanistan 2577 Palm Oil 5142 Food 1000 tonnes 33.94 67.71 ... 71.0 69.0 56.0 51.0 36.0 53.0 59.0 51.0 61 64
26 AFG 2 Afghanistan 2579 Sesameseed Oil 5142 Food 1000 tonnes 33.94 67.71 ... 1.0 1.0 1.0 2.0 2.0 1.0 1.0 2.0 1 1
27 AFG 2 Afghanistan 2580 Olive Oil 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1 1
28 AFG 2 Afghanistan 2586 Oilcrops Oil, Other 5142 Food 1000 tonnes 33.94 67.71 ... 0.0 1.0 0.0 0.0 3.0 1.0 2.0 2.0 2 2
29 AFG 2 Afghanistan 2601 Tomatoes and products 5142 Food 1000 tonnes 33.94 67.71 ... 2.0 2.0 8.0 1.0 0.0 0.0 0.0 0.0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
21447 ZWE 181 Zimbabwe 2765 Crustaceans 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21448 ZWE 181 Zimbabwe 2766 Cephalopods 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21449 ZWE 181 Zimbabwe 2767 Molluscs, Other 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0 0
21450 ZWE 181 Zimbabwe 2775 Aquatic Plants 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21451 ZWE 181 Zimbabwe 2680 Infant food 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21452 ZWE 181 Zimbabwe 2905 Cereals - Excluding Beer 5521 Feed 1000 tonnes -19.02 29.15 ... 75.0 54.0 75.0 55.0 63.0 62.0 55.0 55.0 55 55
21453 ZWE 181 Zimbabwe 2905 Cereals - Excluding Beer 5142 Food 1000 tonnes -19.02 29.15 ... 1844.0 1842.0 1944.0 1962.0 1918.0 1980.0 2011.0 2094.0 2071 2016
21454 ZWE 181 Zimbabwe 2907 Starchy Roots 5142 Food 1000 tonnes -19.02 29.15 ... 223.0 236.0 238.0 228.0 245.0 258.0 258.0 269.0 272 276
21455 ZWE 181 Zimbabwe 2908 Sugar Crops 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21456 ZWE 181 Zimbabwe 2909 Sugar & Sweeteners 5142 Food 1000 tonnes -19.02 29.15 ... 335.0 313.0 339.0 302.0 285.0 287.0 314.0 336.0 396 416
21457 ZWE 181 Zimbabwe 2911 Pulses 5142 Food 1000 tonnes -19.02 29.15 ... 63.0 59.0 61.0 57.0 69.0 78.0 68.0 56.0 52 55
21458 ZWE 181 Zimbabwe 2912 Treenuts 5142 Food 1000 tonnes -19.02 29.15 ... 1.0 2.0 1.0 2.0 2.0 3.0 4.0 2.0 4 3
21459 ZWE 181 Zimbabwe 2913 Oilcrops 5521 Feed 1000 tonnes -19.02 29.15 ... 36.0 46.0 41.0 33.0 31.0 19.0 24.0 17.0 27 30
21460 ZWE 181 Zimbabwe 2913 Oilcrops 5142 Food 1000 tonnes -19.02 29.15 ... 60.0 59.0 61.0 62.0 48.0 44.0 41.0 40.0 38 38
21461 ZWE 181 Zimbabwe 2914 Vegetable Oils 5142 Food 1000 tonnes -19.02 29.15 ... 111.0 114.0 112.0 114.0 134.0 135.0 137.0 147.0 159 160
21462 ZWE 181 Zimbabwe 2918 Vegetables 5142 Food 1000 tonnes -19.02 29.15 ... 161.0 166.0 208.0 185.0 137.0 179.0 215.0 217.0 227 227
21463 ZWE 181 Zimbabwe 2919 Fruits - Excluding Wine 5142 Food 1000 tonnes -19.02 29.15 ... 191.0 134.0 167.0 177.0 185.0 184.0 211.0 230.0 246 217
21464 ZWE 181 Zimbabwe 2922 Stimulants 5142 Food 1000 tonnes -19.02 29.15 ... 7.0 21.0 14.0 24.0 16.0 11.0 23.0 11.0 10 10
21465 ZWE 181 Zimbabwe 2923 Spices 5142 Food 1000 tonnes -19.02 29.15 ... 7.0 11.0 7.0 12.0 16.0 16.0 14.0 11.0 12 12
21466 ZWE 181 Zimbabwe 2924 Alcoholic Beverages 5142 Food 1000 tonnes -19.02 29.15 ... 294.0 290.0 316.0 355.0 398.0 437.0 448.0 476.0 525 516
21467 ZWE 181 Zimbabwe 2943 Meat 5142 Food 1000 tonnes -19.02 29.15 ... 222.0 228.0 233.0 238.0 242.0 265.0 262.0 277.0 280 258
21468 ZWE 181 Zimbabwe 2945 Offals 5142 Food 1000 tonnes -19.02 29.15 ... 20.0 20.0 21.0 21.0 21.0 21.0 21.0 21.0 22 22
21469 ZWE 181 Zimbabwe 2946 Animal fats 5142 Food 1000 tonnes -19.02 29.15 ... 26.0 26.0 29.0 29.0 27.0 31.0 30.0 25.0 26 20
21470 ZWE 181 Zimbabwe 2949 Eggs 5142 Food 1000 tonnes -19.02 29.15 ... 15.0 18.0 18.0 21.0 22.0 27.0 27.0 24.0 24 25
21471 ZWE 181 Zimbabwe 2948 Milk - Excluding Butter 5521 Feed 1000 tonnes -19.02 29.15 ... 21.0 21.0 21.0 21.0 21.0 23.0 25.0 25.0 30 31
21472 ZWE 181 Zimbabwe 2948 Milk - Excluding Butter 5142 Food 1000 tonnes -19.02 29.15 ... 373.0 357.0 359.0 356.0 341.0 385.0 418.0 457.0 426 451
21473 ZWE 181 Zimbabwe 2960 Fish, Seafood 5521 Feed 1000 tonnes -19.02 29.15 ... 5.0 4.0 9.0 6.0 9.0 5.0 15.0 15.0 15 15
21474 ZWE 181 Zimbabwe 2960 Fish, Seafood 5142 Food 1000 tonnes -19.02 29.15 ... 18.0 14.0 17.0 14.0 15.0 18.0 29.0 40.0 40 40
21475 ZWE 181 Zimbabwe 2961 Aquatic Products, Other 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
21476 ZWE 181 Zimbabwe 2928 Miscellaneous 5142 Food 1000 tonnes -19.02 29.15 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0

21477 rows Ă— 63 columns

area_list = list(df['Area'].unique())
year_list = list(df.iloc[:,10:].columns)

plt.figure(figsize=(24,12))
for ar in area_list:
    yearly_produce = []
    for yr in year_list:
        yearly_produce.append(df[yr][df['Area'] == ar].sum())
    plt.plot(yearly_produce, label=ar)
plt.xticks(np.arange(53), tuple(year_list), rotation=60)
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3, ncol=8, mode="expand", borderaxespad=0.)
plt.savefig('p.png')
plt.show()
plt.figure(figsize=(24,12))
<Figure size 1728x864 with 0 Axes>
<Figure size 1728x864 with 0 Axes>

Clearly, China, India and US stand out here. So, these are the countries with most food and feed production.

Now, let's have a close look at their food and feed data

Food and feed plot for the whole dataset

sns.factorplot("Element", data=df, kind="count")
plt.show()
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)

So, there is a huge difference in food and feed production. Now, we have obvious assumptions about the following plots after looking at this huge difference.

Food and feed plot for the largest producers(India, USA, China)

sns.factorplot("Area", data=df[(df['Area'] == "India") | (df['Area'] == "China, mainland") | (df['Area'] == "United States of America")], kind="count", hue="Element", size=8, aspect=.8)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)
<seaborn.axisgrid.FacetGrid at 0x1a218d2550>

Though, there is a huge difference between feed and food production, these countries' total production and their ranks depend on feed production.

Now, we create a dataframe with countries as index and their annual produce as columns from 1961 to 2013.

new_df_dict = {}
for ar in area_list:
    yearly_produce = []
    for yr in year_list:
        yearly_produce.append(df[yr][df['Area']==ar].sum())
    new_df_dict[ar] = yearly_produce
new_df = pd.DataFrame(new_df_dict)

new_df.head()
Afghanistan Albania Algeria Angola Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan ... United Republic of Tanzania United States of America Uruguay Uzbekistan Vanuatu Venezuela (Bolivarian Republic of) Viet Nam Yemen Zambia Zimbabwe
0 9481.0 1706.0 7488.0 4834.0 92.0 43402.0 0.0 25795.0 22542.0 0.0 ... 12367.0 559347.0 4631.0 0.0 97.0 9523.0 23856.0 2982.0 2976.0 3260.0
1 9414.0 1749.0 7235.0 4775.0 94.0 40784.0 0.0 27618.0 22627.0 0.0 ... 12810.0 556319.0 4448.0 0.0 101.0 9369.0 25220.0 3038.0 3057.0 3503.0
2 9194.0 1767.0 6861.0 5240.0 105.0 40219.0 0.0 28902.0 23637.0 0.0 ... 13109.0 552630.0 4682.0 0.0 103.0 9788.0 26053.0 3147.0 3069.0 3479.0
3 10170.0 1889.0 7255.0 5286.0 95.0 41638.0 0.0 29107.0 24099.0 0.0 ... 12965.0 555677.0 4723.0 0.0 102.0 10539.0 26377.0 3224.0 3121.0 3738.0
4 10473.0 1884.0 7509.0 5527.0 84.0 44936.0 0.0 28961.0 22664.0 0.0 ... 13742.0 589288.0 4581.0 0.0 107.0 10641.0 26961.0 3328.0 3236.0 3940.0

5 rows Ă— 174 columns

Now, this is not perfect so we transpose this dataframe and add column names.

new_df = pd.DataFrame.transpose(new_df)
new_df.columns = year_list

new_df.head()
Y1961 Y1962 Y1963 Y1964 Y1965 Y1966 Y1967 Y1968 Y1969 Y1970 ... Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013
Afghanistan 9481.0 9414.0 9194.0 10170.0 10473.0 10169.0 11289.0 11508.0 11815.0 10454.0 ... 16542.0 17658.0 18317.0 19248.0 19381.0 20661.0 21030.0 21100.0 22706.0 23007.0
Albania 1706.0 1749.0 1767.0 1889.0 1884.0 1995.0 2046.0 2169.0 2230.0 2395.0 ... 6637.0 6719.0 6911.0 6744.0 7168.0 7316.0 7907.0 8114.0 8221.0 8271.0
Algeria 7488.0 7235.0 6861.0 7255.0 7509.0 7536.0 7986.0 8839.0 9003.0 9355.0 ... 48619.0 49562.0 51067.0 49933.0 50916.0 57505.0 60071.0 65852.0 69365.0 72161.0
Angola 4834.0 4775.0 5240.0 5286.0 5527.0 5677.0 5833.0 5685.0 6219.0 6460.0 ... 25541.0 26696.0 28247.0 29877.0 32053.0 36985.0 38400.0 40573.0 38064.0 48639.0
Antigua and Barbuda 92.0 94.0 105.0 95.0 84.0 73.0 64.0 59.0 68.0 77.0 ... 92.0 115.0 110.0 122.0 115.0 114.0 115.0 118.0 113.0 119.0

5 rows Ă— 53 columns

Perfect! Now, we will do some feature engineering.

First, a new column which indicates mean produce of each state over the given years. Second, a ranking column which ranks countries on the basis of mean produce.

mean_produce = []
for i in range(174):
    mean_produce.append(new_df.iloc[i,:].values.mean())
new_df['Mean_Produce'] = mean_produce

new_df['Rank'] = new_df['Mean_Produce'].rank(ascending=False)

new_df.head()
Y1961 Y1962 Y1963 Y1964 Y1965 Y1966 Y1967 Y1968 Y1969 Y1970 ... Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Mean_Produce Rank
Afghanistan 9481.0 9414.0 9194.0 10170.0 10473.0 10169.0 11289.0 11508.0 11815.0 10454.0 ... 18317.0 19248.0 19381.0 20661.0 21030.0 21100.0 22706.0 23007.0 13003.056604 69.0
Albania 1706.0 1749.0 1767.0 1889.0 1884.0 1995.0 2046.0 2169.0 2230.0 2395.0 ... 6911.0 6744.0 7168.0 7316.0 7907.0 8114.0 8221.0 8271.0 4475.509434 104.0
Algeria 7488.0 7235.0 6861.0 7255.0 7509.0 7536.0 7986.0 8839.0 9003.0 9355.0 ... 51067.0 49933.0 50916.0 57505.0 60071.0 65852.0 69365.0 72161.0 28879.490566 38.0
Angola 4834.0 4775.0 5240.0 5286.0 5527.0 5677.0 5833.0 5685.0 6219.0 6460.0 ... 28247.0 29877.0 32053.0 36985.0 38400.0 40573.0 38064.0 48639.0 13321.056604 68.0
Antigua and Barbuda 92.0 94.0 105.0 95.0 84.0 73.0 64.0 59.0 68.0 77.0 ... 110.0 122.0 115.0 114.0 115.0 118.0 113.0 119.0 83.886792 172.0

5 rows Ă— 55 columns

Now, we create another dataframe with items and their total production each year from 1961 to 2013

item_list = list(df['Item'].unique())

item_df = pd.DataFrame()
item_df['Item_Name'] = item_list

for yr in year_list:
    item_produce = []
    for it in item_list:
        item_produce.append(df[yr][df['Item']==it].sum())
    item_df[yr] = item_produce
item_df.head()
Item_Name Y1961 Y1962 Y1963 Y1964 Y1965 Y1966 Y1967 Y1968 Y1969 ... Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013
0 Wheat and products 138829.0 144643.0 147325.0 156273.0 168822.0 169832.0 171469.0 179530.0 189658.0 ... 527394.0 532263.0 537279.0 529271.0 562239.0 557245.0 549926.0 578179.0 576597 587492
1 Rice (Milled Equivalent) 122700.0 131842.0 139507.0 148304.0 150056.0 155583.0 158587.0 164614.0 167922.0 ... 361107.0 366025.0 372629.0 378698.0 389708.0 394221.0 398559.0 404152.0 406787 410880
2 Barley and products 46180.0 48915.0 51642.0 54184.0 54945.0 55463.0 56424.0 60455.0 65501.0 ... 102055.0 97185.0 100981.0 93310.0 98209.0 99135.0 92563.0 92570.0 88766 99452
3 Maize and products 168039.0 168305.0 172905.0 175468.0 190304.0 200860.0 213050.0 215613.0 221953.0 ... 545024.0 549036.0 543280.0 573892.0 592231.0 557940.0 584337.0 603297.0 608730 671300
4 Millet and products 19075.0 19019.0 19740.0 20353.0 18377.0 20860.0 22997.0 21785.0 23966.0 ... 25789.0 25496.0 25997.0 26750.0 26373.0 24575.0 27039.0 25740.0 26105 26346

5 rows Ă— 54 columns

Some more feature engineering

This time, we will use the new features to get some good conclusions.

1. Total amount of item produced from 1961 to 2013

2. Providing a rank to the items to know the most produced item

sum_col = []
for i in range(115):
    sum_col.append(item_df.iloc[i,1:].values.sum())
item_df['Sum'] = sum_col
item_df['Production_Rank'] = item_df['Sum'].rank(ascending=False)

item_df.head()
Item_Name Y1961 Y1962 Y1963 Y1964 Y1965 Y1966 Y1967 Y1968 Y1969 ... Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Sum Production_Rank
0 Wheat and products 138829.0 144643.0 147325.0 156273.0 168822.0 169832.0 171469.0 179530.0 189658.0 ... 537279.0 529271.0 562239.0 557245.0 549926.0 578179.0 576597 587492 19194671.0 6.0
1 Rice (Milled Equivalent) 122700.0 131842.0 139507.0 148304.0 150056.0 155583.0 158587.0 164614.0 167922.0 ... 372629.0 378698.0 389708.0 394221.0 398559.0 404152.0 406787 410880 14475448.0 8.0
2 Barley and products 46180.0 48915.0 51642.0 54184.0 54945.0 55463.0 56424.0 60455.0 65501.0 ... 100981.0 93310.0 98209.0 99135.0 92563.0 92570.0 88766 99452 4442742.0 20.0
3 Maize and products 168039.0 168305.0 172905.0 175468.0 190304.0 200860.0 213050.0 215613.0 221953.0 ... 543280.0 573892.0 592231.0 557940.0 584337.0 603297.0 608730 671300 19960640.0 5.0
4 Millet and products 19075.0 19019.0 19740.0 20353.0 18377.0 20860.0 22997.0 21785.0 23966.0 ... 25997.0 26750.0 26373.0 24575.0 27039.0 25740.0 26105 26346 1225400.0 38.0

5 rows Ă— 56 columns

Now, we find the most produced food items in the last half-century

item_df['Item_Name'][item_df['Production_Rank'] &lt; 11.0].sort_values()
56    Cereals - Excluding Beer
65     Fruits - Excluding Wine
3           Maize and products
53     Milk - Excluding Butter
6        Potatoes and products
1     Rice (Milled Equivalent)
57               Starchy Roots
64                  Vegetables
27           Vegetables, Other
0           Wheat and products
Name: Item_Name, dtype: object

So, cereals, fruits and maize are the most produced items in the last 50 years

Food and feed plot for most produced items

sns.factorplot("Item", data=df[(df['Item']=='Wheat and products') | (df['Item']=='Rice (Milled Equivalent)') | (df['Item']=='Maize and products') | (df['Item']=='Potatoes and products') | (df['Item']=='Vegetables, Other') | (df['Item']=='Milk - Excluding Butter') | (df['Item']=='Cereals - Excluding Beer') | (df['Item']=='Starchy Roots') | (df['Item']=='Vegetables') | (df['Item']=='Fruits - Excluding Wine')], kind="count", hue="Element", size=20, aspect=.8)
plt.show()
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)

Now, we plot a heatmap of correlation of produce in difference years

year_df = df.iloc[:,10:]
fig, ax = plt.subplots(figsize=(16,10))
sns.heatmap(year_df.corr(), ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a23b4b128>

So, we gather that a given year's production is more similar to its immediate previous and immediate following years.

f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10,10))
ax1.set(xlabel='Y1968', ylabel='Y1961')
ax2.set(xlabel='Y1968', ylabel='Y1963')
ax3.set(xlabel='Y1968', ylabel='Y1986')
ax4.set(xlabel='Y1968', ylabel='Y2013')
sns.jointplot(x="Y1968", y="Y1961", data=df, kind="reg", ax=ax1)
sns.jointplot(x="Y1968", y="Y1963", data=df, kind="reg", ax=ax2)
sns.jointplot(x="Y1968", y="Y1986", data=df, kind="reg", ax=ax3)
sns.jointplot(x="Y1968", y="Y2013", data=df, kind="reg", ax=ax4)
plt.close(2)
plt.close(3)
plt.close(4)
plt.close(5)
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Heatmap of production of food items over years

This will detect the items whose production has drastically increased over the years

new_item_df = item_df.drop(["Item_Name","Sum","Production_Rank"], axis = 1)
fig, ax = plt.subplots(figsize=(12,24))
sns.heatmap(new_item_df,ax=ax)
ax.set_yticklabels(item_df.Item_Name.values[::-1])
plt.show()

There is considerable growth in production of Palmkernel oil, Meat/Aquatic animals, ricebran oil, cottonseed, seafood, offals, roots, poultry meat, mutton, bear, cocoa, coffee and soyabean oil. There has been exceptional growth in production of onions, cream, sugar crops, treenuts, butter/ghee and to some extent starchy roots.

Now, we look at clustering.

What is clustering?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Today, we will form clusters to classify countries based on productivity scale

For this, we will use k-means clustering algorithm.

K-means clustering

(Source Wikipedia ) http://gdurl.com/5BbP

This is the data we will use.

new_df.head()
Y1961 Y1962 Y1963 Y1964 Y1965 Y1966 Y1967 Y1968 Y1969 Y1970 ... Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Mean_Produce Rank
Afghanistan 9481.0 9414.0 9194.0 10170.0 10473.0 10169.0 11289.0 11508.0 11815.0 10454.0 ... 18317.0 19248.0 19381.0 20661.0 21030.0 21100.0 22706.0 23007.0 13003.056604 69.0
Albania 1706.0 1749.0 1767.0 1889.0 1884.0 1995.0 2046.0 2169.0 2230.0 2395.0 ... 6911.0 6744.0 7168.0 7316.0 7907.0 8114.0 8221.0 8271.0 4475.509434 104.0
Algeria 7488.0 7235.0 6861.0 7255.0 7509.0 7536.0 7986.0 8839.0 9003.0 9355.0 ... 51067.0 49933.0 50916.0 57505.0 60071.0 65852.0 69365.0 72161.0 28879.490566 38.0
Angola 4834.0 4775.0 5240.0 5286.0 5527.0 5677.0 5833.0 5685.0 6219.0 6460.0 ... 28247.0 29877.0 32053.0 36985.0 38400.0 40573.0 38064.0 48639.0 13321.056604 68.0
Antigua and Barbuda 92.0 94.0 105.0 95.0 84.0 73.0 64.0 59.0 68.0 77.0 ... 110.0 122.0 115.0 114.0 115.0 118.0 113.0 119.0 83.886792 172.0

5 rows Ă— 55 columns

X = new_df.iloc[:,:-2].values

X = pd.DataFrame(X)
X = X.convert_objects(convert_numeric=True)
X.columns = year_list
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  after removing the cwd from sys.path.

Elbow method to select number of clusters

This method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data. More precisely, if one plots the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters is chosen at this point, hence the "elbow criterion". This "elbow" cannot always be unambiguously identified. Percentage of variance explained is the ratio of the between-group variance to the total variance, also known as an F-test. A slight variation of this method plots the curvature of the within group variance.

Basically, number of clusters = the x-axis value of the point that is the corner of the "elbow"(the plot looks often looks like an elbow)

from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
    kmeans = KMeans(n_clusters=i,init='k-means++',max_iter=300,n_init=10,random_state=0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
plt.plot(range(1,11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

As the elbow corner coincides with x=2, we will have to form 2 clusters. Personally, I would have liked to select 3 to 4 clusters. But trust me, only selecting 2 clusters can lead to best results. Now, we apply k-means algorithm.

kmeans = KMeans(n_clusters=2,init='k-means++',max_iter=300,n_init=10,random_state=0) 
y_kmeans = kmeans.fit_predict(X)

X = X.as_matrix(columns=None)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  after removing the cwd from sys.path.

Now, let's visualize the results.

plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0,1],s=100,c='red',label='Others')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1,1],s=100,c='blue',label='China(mainland),USA,India')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300,c='yellow',label='Centroids')
plt.title('Clusters of countries by Productivity')
plt.legend()
plt.show()

So, the blue cluster represents China(Mainland), USA and India while the red cluster represents all the other countries. This result was highly probable. Just take a look at the plot of cell 3 above. See how China, USA and India stand out. That has been observed here in clustering too.

You should try this algorithm for 3 or 4 clusters. Looking at the distribution, you will realise why 2 clusters is the best choice for the given data

This is not the end! More is yet to come.

Now, lets try to predict the production using regression for 2020. We will predict the production for USA,India and Pakistan.

india_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    x=df[(df.Area=='India') & (df.Element=='Food')][i].mean()
    india_list.append(x)    

reset=[]
for i in year_list:
    reset.append(int(i[1:]))


reset=np.array(reset)
reset=reset.reshape(-1,1)


india_list=np.array(india_list)
india_list=india_list.reshape(-1,1)


reg = LinearRegression()
reg.fit(reset,india_list)
predictions = reg.predict(reset)
plt.title("India")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,india_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))

df[(df.Area=='India') & (df.Element=='Food')]['Y1961'].mean()

df[(df.Area=='Pakistan') & (df.Element=='Food')]

Pak_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    yx=df[(df.Area=='Pakistan') & (df.Element=='Food')][i].mean()
    Pak_list.append(yx)   

Pak_list=np.array(Pak_list)
Pak_list=Pak_list.reshape(-1,1)
Pak_list
reg = LinearRegression()
reg.fit(reset,Pak_list)
predictions = reg.predict(reset)
plt.title("Pakistan")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,Pak_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))



usa_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    xu=df[(df.Area=='United States of America') & (df.Element=='Food')][i].mean()
    usa_list.append(xu)

usa_list=np.array(usa_list)
usa_list=india_list.reshape(-1,1)


reg = LinearRegression()
reg.fit(reset,usa_list)
predictions = reg.predict(reset)
plt.title("USA")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,usa_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
&lt;ipython-input-24-da7cfa1c86d1&gt; in &lt;module&gt;
     27 plt.plot(reset,predictions)
     28 plt.show()
---&gt; 29 print(reg.predict(2020))
     30 
     31 df[(df.Area=='India') & (df.Element=='Food')]['Y1961'].mean()

/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in predict(self, X)
    211             Returns predicted values.
    212         """
--&gt; 213         return self._decision_function(X)
    214 
    215     _preprocess_data = staticmethod(_preprocess_data)

/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py in _decision_function(self, X)
    194         check_is_fitted(self, "coef_")
    195 
--&gt; 196         X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    197         return safe_sparse_dot(X, self.coef_.T,
    198                                dense_output=True) + self.intercept_

/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    543                     "Reshape your data either using array.reshape(-1, 1) if "
    544                     "your data has a single feature or array.reshape(1, -1) "
--&gt; 545                     "if it contains a single sample.".format(array))
    546             # If input is 1D raise error
    547             if array.ndim == 1:

ValueError: Expected 2D array, got scalar array instead:
array=2020.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.