Use Google BigQuery to Know the Top 100 Popular Python Packages in 2024

Bruce Wen
3 min readFeb 28, 2024

Statistics data is charming.

Python Scatter Chart

Python developers download kinds of Python packages from PyPI. According to Analyzing PyPI package downloads, PyPI does not display download statistics for a number of reasons. As an alternative, the Linehaul project streams download logs from PyPI to Google BigQuery, where they are stored as a public dataset.

The public dataset is called as bigquery-public-data.pypi.file_downloads. It’s accessible from Google BigQuery.

I used Google Cloud with 90 days trial. To run SQL to query the data, I login Google Cloud and go to BigQuery Studio.

Click “Create SQL query” button to start a new SQL query editor. I ran the below SQL to get the result.

SELECT * FROM
(
SELECT count(*) as download_count, project, file.version
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
DATE(timestamp)
BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 59 DAY)
AND CURRENT_DATE()
GROUP BY
project, file.version
) sub
ORDER BY
download_count DESC
LIMIT 100

--

--