Download Kaggle Datasets with API in Jupyter Notebook

Download Kaggle Datasets with API in Jupyter Notebook

Table of Contents

    Kaggle is a popular platform for data science and machine learning enthusiasts. It provides access to a vast collection of datasets that can be used for analysis, research, and model training.

    There are two ways to download datasets from the Kaggle website: manual download or through Kaggle API.

    In this post, you will learn how to use the Kaggle API to quickly download datasets directly into your Jupyter Notebook with Python code without hassle. It will walk you through setting up the Kaggle API and downloading datasets step by step.

    What is Kaggle API?

    The Kaggle API is a command-line tool that allows users to interact with Kaggle directly from their terminal or Jupyter Notebook. It enables users to search for datasets, download files, and even submit machine learning competitions efficiently.

    By integrating the Kaggle API with Python-based environments like Jupyter Notebook, data scientists can streamline their workflow, automate dataset retrieval, and eliminate the need for manual downloads. This makes data preprocessing faster and more efficient.

    How to Set Up Kaggle API on Jupyter Notebook

    Step 1: Create a Kaggle Account

    If you don’t have a Kaggle account, visit Kaggle’s website and sign up for a free account.

    Step 2: Generate and Download the Kaggle API Token

    1. Login to your Kaggle account
    2. Go to your account profile. at the top right corner
    3. Click on Settings.
    4. Scroll down to the API section.
    5. Click on Create New Token. A file named kaggle.json will be downloaded.

    Step 3: Move the Kaggle API File to the Appropriate Directory

    1. Move kaggle.json to the appropriate location.
      • For Windows, place it in C:\Users\YourName\.kaggle\
      • For Mac/Linux, place it in ~/.kaggle/ or Jupyter root folder

    Step 4: Install the Kaggle API

    Run the following command in a Jupyter Notebook cell to install the Kaggle API:

    !pip install kaggle

    Alternatively, run the following command in your computer terminal

    pip install kaggle

    Once installed, the Kaggle API is ready to use.

    How to Download Kaggle Datasets in Jupyter Notebook

    Step 1: Find the Dataset Name or URL on Kaggle

    1. Open Kaggle and navigate to the dataset you want to download.
    2. Click on the Download button and copy the dataset API ID.

    Step 2: Use the Kaggle API Command to Download Datasets

    1. Open Jupyter Notebook.
    2. Enter the following command:
    !kaggle datasets download -d dataset-owner/dataset-name

    Replace dataset-owner/dataset-name with the actual dataset ID.

    For example, to download the Kaggle datasets with ID: ankushpanday1/prostate-cancer-prediction-dataset, I will enter the following command:

    !kaggle datasets download -d ankushpanday1/prostate-cancer-prediction-dataset

    Step 3: Extract the Dataset if It’s in a ZIP File

    If the dataset is downloaded as a ZIP file, extract it using:

    !unzip dataset-name.zip

    In the example above, it downloaded this file: prostate-cancer-prediction-dataset.zip which means I will enter the following command to unzip it

    !unzip prostate-cancer-prediction-dataset.zip

    Step 4: Load the Dataset into Pandas for Analysis

    To load the dataset into a Pandas DataFrame, use:

    import pandas as pd
    df = pd.read_csv("dataset-file.csv")
    df.head()

    Replace dataset-file.csv with the actual file name from the dataset.

    In the example above, I enter this command to view the dataset:

    df=pd.read_csv('prostate_cancer_prediction.csv')
    df

    Video: How to Download Dataset using Kaggle API on Jupyter Notebook

    Troubleshooting Common Issues

    Error: “403 Forbidden” – Permission denied

    This error means the dataset ID is wrong. Go back to the Kaggle dataset page and accurately copy it. Here’s an example:

    Resolve Kaggle Error: "403 Forbidden" – Permission denied

    In the screenshot above, the original ID ends with ‘dataset’ and not ‘datasets’ which caused the error. By changing it, I could download the dataset.

    Error: “403 Forbidden” – Fix Authentication Issues

    This error occurs when the API authentication fails. Ensure that:

    • The kaggle.json file is correctly placed in ~/.kaggle/ (Linux/Mac) or C:\Users\YourName\.kaggle\(Windows).
    • The API token has not expired. If needed, regenerate and re-download it from Kaggle.

    Error: “No module named kaggle” – Ensure Proper Installation

    This error means the Kaggle API module is not installed. Run the following command in Jupyter Notebook:

    !pip install kaggle

    Error: “FileNotFoundError: kaggle.json” – Fix Incorrect File Path

    If the Kaggle API cannot find kaggle.json, make sure:

    • It is correctly placed in the appropriate directory.
    • You have used the command !mkdir -p ~/.kaggle && mv kaggle.json ~/.kaggle/ (for Linux/Mac users) to move it to the right location.

    Error: “Permission denied” – Adjust File Permissions

    If you face a permission error, run this command to update file permissions:

    !chmod 600 ~/.kaggle/kaggle.json

    Read: How to Get Google Data Analytics Certificate for free

    FAQs

    Can I use Kaggle API without a Kaggle account?

    No, you need a Kaggle account to access the API.

    Where should I place the kaggle.json file?

    Place it in:
    C:\Users\YourName\.kaggle\ (Windows)
    ~/.kaggle/ (Linux/Mac)

    How do I update the Kaggle API?

    Run the following command to update:
    !pip install –upgrade kaggle

    Can I download multiple datasets at once?

    No, you need to download them individually using separate commands.

    Does Kaggle API work on Google Colab?

    Yes, but you must upload kaggle.json every session.

    Summary

    In this guide, we covered how to set up and use the Kaggle API to download datasets directly into Jupyter Notebook. By automating the dataset retrieval process, you can save time and focus on data analysis and machine learning tasks.

    Key Takeaways:

    • The Kaggle API allows seamless interaction with Kaggle datasets.
    • You need to create an account and download an API token to authenticate.
    • Proper installation and setup of kaggle.json are crucial to avoid authentication errors.
    • The API makes it easy to download, extract, and load datasets for analysis.
    • Troubleshooting common issues ensures smooth operation.

    By leveraging the Kaggle API, you can enhance your data science workflow. Start exploring datasets today and streamline your machine-learning projects!

    If you have any questions or experiences to share, feel free to leave a comment below!


    Leave a Reply

    Your email address will not be published. Required fields are marked *