Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save skilfoy/dd32e79c06c103d86bd11a396d38011a to your computer and use it in GitHub Desktop.
Save skilfoy/dd32e79c06c103d86bd11a396d38011a to your computer and use it in GitHub Desktop.
Sowing Success - Using Machine Learning to Help Farmers Select the Best Crops.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/skilfoy/dd32e79c06c103d86bd11a396d38011a/sowing-success-using-machine-learning-to-help-farmers-select-the-best-crops.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Sowing Success: Using Machine Learning to Help Farmers Select the Best Crops"
],
"metadata": {
"id": "T9i4L7iEBiAi"
},
"id": "T9i4L7iEBiAi"
},
{
"cell_type": "markdown",
"source": [
"![farmer_in_a_field.jpg]()"
],
"metadata": {
"id": "TpUb3xNdBbCO"
},
"id": "TpUb3xNdBbCO"
},
{
"source": [
"Measuring essential soil metrics such as nitrogen, phosphorous, potassium levels, and pH value is an important aspect of assessing soil condition. However, it can be an expensive and time-consuming process, which can cause farmers to prioritize which metrics to measure based on their budget constraints.\n",
"\n",
"Farmers have various options when it comes to deciding which crop to plant each season. Their primary objective is to maximize the yield of their crops, taking into account different factors. One crucial factor that affects crop growth is the condition of the soil in the field, which can be assessed by measuring basic elements such as nitrogen and potassium levels. Each crop has an ideal soil condition that ensures optimal growth and maximum yield.\n",
"\n",
"A farmer has reached out for expert ML assistance in selecting the best crop for his field. He's provided you with a dataset called `soil_measures.csv`, which contains:\n",
"\n",
"- `\"N\"`: Nitrogen content ratio in the soil\n",
"- `\"P\"`: Phosphorous content ratio in the soil\n",
"- `\"K\"`: Potassium content ratio in the soil\n",
"- `\"pH\"` value of the soil\n",
"- `\"crop\"`: categorical values that contain various crops (target variable).\n",
"\n",
"Each row in this dataset represents various measures of the soil in a particular field. Based on these measurements, the crop specified in the `\"crop\"` column is the optimal choice for that field. \n",
"\n",
"In this project, we will build multi-class classification models to predict the type of `\"crop\"` and identify the single most importance feature for predictive performance."
],
"metadata": {
"id": "d3d001b0-2e2f-4b58-8442-99520bad831f"
},
"id": "d3d001b0-2e2f-4b58-8442-99520bad831f",
"cell_type": "markdown"
},
{
"source": [
"import pandas as pd\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import metrics\n",
"from sklearn.metrics import f1_score"
],
"metadata": {
"id": "d0eb4f16-5a99-460d-a5ba-706b7ef0bbe7",
"executionTime": 11,
"lastSuccessfullyExecutedCode": "import pandas as pd\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import train_test_split\nfrom sklearn import metrics\nfrom sklearn.metrics import f1_score",
"executionCancelledAt": null,
"lastExecutedAt": 1713927001949,
"lastScheduledRunId": null,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b"
},
"id": "d0eb4f16-5a99-460d-a5ba-706b7ef0bbe7",
"cell_type": "code",
"execution_count": null,
"outputs": []
},
{
"source": [
"## Load Data and Perform Exploratory Analysis\n",
"\n",
"In this section, we load the dataset `soil_measures.csv` which contains essential soil metrics and crop type data for various fields. The metrics include Nitrogen (N), Phosphorous (P), Potassium (K), and pH levels—key indicators of soil health. We use these to predict the optimal crop ('crop' column) for each field. The first task is to inspect the first few rows of our dataset with `crops.head()` to understand the structure and types of data we're working with. Following this, we check for missing values to ensure the quality and completeness of our data, which is crucial for accurate model predictions."
],
"metadata": {
"id": "58570775-b185-46cf-a725-352b0ed79a71"
},
"cell_type": "markdown",
"id": "58570775-b185-46cf-a725-352b0ed79a71"
},
{
"source": [
"# Load the dataset\n",
"crops = pd.read_csv(\"soil_measures.csv\")\n",
"crops.head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 54,
"lastExecutedAt": 1713927002003,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b",
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "# Load the dataset\ncrops = pd.read_csv(\"soil_measures.csv\")\ncrops.head()",
"outputsMetadata": {
"0": {
"height": 186,
"type": "dataFrame"
}
},
"id": "d0745685-dc24-4b6f-a4bd-d06096845348",
"outputId": "e187e9e1-37d9-4164-a57d-6f0c60fd91fc"
},
"cell_type": "code",
"id": "d0745685-dc24-4b6f-a4bd-d06096845348",
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/com.datacamp.data-table.v2+json": {
"table": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "N",
"type": "integer"
},
{
"name": "P",
"type": "integer"
},
{
"name": "K",
"type": "integer"
},
{
"name": "ph",
"type": "number"
},
{
"name": "crop",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": {
"index": [
0,
1,
2,
3,
4
],
"N": [
90,
85,
60,
74,
78
],
"P": [
42,
58,
55,
35,
42
],
"K": [
43,
41,
44,
40,
42
],
"ph": [
6.502985292,
7.038096361,
7.840207144,
6.980400905,
7.628472891
],
"crop": [
"rice",
"rice",
"rice",
"rice",
"rice"
]
}
},
"total_rows": 5,
"truncation_type": null
},
"text/plain": " N P K ph crop\n0 90 42 43 6.502985 rice\n1 85 58 41 7.038096 rice\n2 60 55 44 7.840207 rice\n3 74 35 40 6.980401 rice\n4 78 42 42 7.628473 rice",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>N</th>\n <th>P</th>\n <th>K</th>\n <th>ph</th>\n <th>crop</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>90</td>\n <td>42</td>\n <td>43</td>\n <td>6.502985</td>\n <td>rice</td>\n </tr>\n <tr>\n <th>1</th>\n <td>85</td>\n <td>58</td>\n <td>41</td>\n <td>7.038096</td>\n <td>rice</td>\n </tr>\n <tr>\n <th>2</th>\n <td>60</td>\n <td>55</td>\n <td>44</td>\n <td>7.840207</td>\n <td>rice</td>\n </tr>\n <tr>\n <th>3</th>\n <td>74</td>\n <td>35</td>\n <td>40</td>\n <td>6.980401</td>\n <td>rice</td>\n </tr>\n <tr>\n <th>4</th>\n <td>78</td>\n <td>42</td>\n <td>42</td>\n <td>7.628473</td>\n <td>rice</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 35
}
],
"execution_count": null
},
{
"source": [
"# Check for missing values\n",
"print(\"Missing values in each column:\\n\", crops.isna().sum())"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 53,
"lastExecutedAt": 1713927002056,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b",
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "# Check for missing values\nprint(\"Missing values in each column:\\n\", crops.isna().sum())",
"outputsMetadata": {
"0": {
"height": 164,
"type": "stream"
}
},
"id": "cd7cffa5-1dd7-402e-91ff-1f9a792e7591",
"outputId": "f62fc5ed-fc4a-4cc6-9b1b-d995f780aed9"
},
"cell_type": "code",
"id": "cd7cffa5-1dd7-402e-91ff-1f9a792e7591",
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Missing values in each column:\n N 0\nP 0\nK 0\nph 0\ncrop 0\ndtype: int64\n"
}
],
"execution_count": null
},
{
"source": [
"# Check unique values in 'crop'\n",
"print(\"Unique crop types:\", crops['crop'].unique())"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 47,
"lastExecutedAt": 1713927002103,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b",
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "# Check unique values in 'crop'\nprint(\"Unique crop types:\", crops['crop'].unique())",
"outputsMetadata": {
"0": {
"height": 164,
"type": "stream"
}
},
"id": "f3757346-b919-4a17-b2cc-ba064b66d5d4",
"outputId": "4aea943a-cb03-4a50-efb8-537a0d2ec988"
},
"cell_type": "code",
"id": "f3757346-b919-4a17-b2cc-ba064b66d5d4",
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Unique crop types: ['rice' 'maize' 'chickpea' 'kidneybeans' 'pigeonpeas' 'mothbeans'\n 'mungbean' 'blackgram' 'lentil' 'pomegranate' 'banana' 'mango' 'grapes'\n 'watermelon' 'muskmelon' 'apple' 'orange' 'papaya' 'coconut' 'cotton'\n 'jute' 'coffee']\n"
}
],
"execution_count": null
},
{
"source": [
"## Prepare the Data\n",
"\n",
"Here, we prepare our dataset for the machine learning model. This involves splitting the dataset into features (`X`) and the target variable (`y`). The `crop` column is the target that our model will learn to predict, and the remaining columns are features that provide the inputs to our model. We then split these into training and testing sets using `train_test_split`. This step is vital as it helps in validating the performance of our model on unseen data, ensuring that our predictions will be robust in practical scenarios."
],
"metadata": {
"id": "bd1a4b41-7381-4c31-b054-107d14817e95"
},
"cell_type": "markdown",
"id": "bd1a4b41-7381-4c31-b054-107d14817e95"
},
{
"source": [
"# Features and target\n",
"X = crops.drop('crop', axis=1)\n",
"y = crops['crop']\n",
"\n",
"# Splitting the dataset\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 56,
"lastExecutedAt": 1713927002159,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b",
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "# Features and target\nX = crops.drop('crop', axis=1)\ny = crops['crop']\n\n# Splitting the dataset\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)",
"id": "0ef1dc35-2205-4fbb-acb2-5bcfbb718cf6"
},
"cell_type": "code",
"id": "0ef1dc35-2205-4fbb-acb2-5bcfbb718cf6",
"outputs": [],
"execution_count": null
},
{
"source": [
"## Evaluate Each Feature Individually\n",
"\n",
"In this critical section, we evaluate the predictive power of each soil metric individually using logistic regression, a powerful model suitable for multi-class classification. By assessing each feature (N, P, K, pH) separately, we can determine which single soil metric has the strongest relationship with crop type. We fit a logistic regression model for each feature and then predict the crop type on the test set. We use the F1-score as our metric because it considers both precision and recall, providing a balance that is particularly useful in multi-class settings where some classes might be underrepresented."
],
"metadata": {
"id": "57470448-e11f-4d85-ae9f-d0c85acde6aa"
},
"cell_type": "markdown",
"id": "57470448-e11f-4d85-ae9f-d0c85acde6aa"
},
{
"source": [
"# Dictionary to store F1 scores for each feature\n",
"feature_performance = {}\n",
"\n",
"# Evaluate each feature\n",
"for feature in ['N', 'P', 'K', 'ph']:\n",
" # Model initialization\n",
" log_reg = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)\n",
"\n",
" # Fit model on training data for the current feature\n",
" log_reg.fit(X_train[[feature]], y_train)\n",
"\n",
" # Predict on test data\n",
" y_pred = log_reg.predict(X_test[[feature]])\n",
"\n",
" # Calculate F1 score and update dictionary\n",
" f1 = f1_score(y_test, y_pred, average='weighted')\n",
" feature_performance[feature] = f1\n",
" print(f\"F1-score for {feature}: {f1}\")"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": null,
"lastExecutedAt": null,
"lastExecutedByKernel": null,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": null,
"outputsMetadata": {
"0": {
"height": 101,
"type": "stream"
}
},
"id": "8d26265e-152d-4f9f-843a-8d9c31abc1ce",
"outputId": "301ccef7-a3c8-41db-ea6a-fa74a0e03a38"
},
"cell_type": "code",
"id": "8d26265e-152d-4f9f-843a-8d9c31abc1ce",
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "F1-score for N: 0.10611720057539394\nF1-score for P: 0.1301839798192734\nF1-score for K: 0.2006673728470238\nF1-score for ph: 0.06787631271947597\n"
}
],
"execution_count": null
},
{
"source": [
"## Determine the Best Predictive Feature\n",
"\n",
"After evaluating each feature, we identify which one has the highest F1-score, indicating the strongest individual predictor of crop type. This analysis helps in understanding the relative importance of each soil measure in predicting the optimal crop, which can significantly aid decision-making in agricultural planning and management."
],
"metadata": {
"id": "e70ac7f4-db48-46d2-8f65-c58c07d7dc12"
},
"cell_type": "markdown",
"id": "e70ac7f4-db48-46d2-8f65-c58c07d7dc12"
},
{
"source": [
"# Find the best predictive feature based on F1 score\n",
"best_feature = max(feature_performance, key=feature_performance.get)\n",
"best_predictive_feature = {best_feature: feature_performance[best_feature]}\n",
"\n",
"# Output the best predictive feature\n",
"print(\"Best predictive feature and score:\", best_predictive_feature)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 52,
"lastExecutedAt": 1713927055744,
"lastExecutedByKernel": "df6198fd-03de-45ab-ab25-e5c5f3c1b27b",
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "# Find the best predictive feature based on F1 score\nbest_feature = max(feature_performance, key=feature_performance.get)\nbest_predictive_feature = {best_feature: feature_performance[best_feature]}\n\n# Output the best predictive feature\nprint(\"Best predictive feature and score:\", best_predictive_feature)",
"outputsMetadata": {
"0": {
"height": 38,
"type": "stream"
}
},
"id": "b46e6baa-67d7-45f6-8fe6-dc0e927d4bb7",
"outputId": "62a2f65d-6a90-493a-b865-15f2a388aac7"
},
"cell_type": "code",
"id": "b46e6baa-67d7-45f6-8fe6-dc0e927d4bb7",
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Best predictive feature and score: {'K': 0.2006673728470238}\n"
}
],
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"## Conclusion\n",
"\n",
"The analysis revealed that Potassium (K) content in the soil is the most predictive feature for determining the suitable crop type, with an F1-score of 0.2007. This suggests that among the soil measures, potassium levels are most closely linked to crop suitability, highlighting its critical role in crop nutrient management and soil fertility.\n",
"\n",
"**Lessons Learned:**\n",
"The approach demonstrated the value of machine learning in precision agriculture, specifically in optimizing crop selection based on simple soil measures. However, the relatively low F1-scores across all features suggest that a single soil metric might not be sufficient to accurately predict crop type, indicating the complex interplay of factors that influence crop growth.\n",
"\n",
"**Future Research:**\n",
"Further research could explore models that incorporate multiple features simultaneously, potentially improving predictive performance. Additionally, integrating other soil characteristics, such as organic matter content and moisture levels, or even climatic conditions, could provide a more holistic view and substantially enhance the model's accuracy. Engaging in feature engineering and more advanced machine learning models like Random Forests or Gradient Boosting Machines could also yield better predictions and insights into the relationships between soil conditions and crop viability."
],
"metadata": {
"id": "QzvWJMOoCRja"
},
"id": "QzvWJMOoCRja"
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "xlBqy2OsCZVy"
},
"id": "xlBqy2OsCZVy",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment