DATA MEDIA LAB

goodreadsbooks

หนังสือดีน่าอ่าน

อ.มงดล กิจมนูญ
Created : 18 Jul 2019 09:56:39

Tags :

Description

The primary reason for creating this dataset is the requirement of a good clean dataset of books. Being a bookie myself (see what I did there?) I had searched for datasets on books in kaggle itself - and I found out that while most of the datasets had a good amount of books listed, there were either a) major columns missing or b) grossly unclean data. I mean, you can't determine how good a book is just from a few text reviews, come on! What I needed were numbers, solid integers and floats that say how many people liked the book or hated it, how much did they like it, and stuff like that. Even the good dataset that I found was well-cleaned, it had a number of interlinked files, which increased the hassle. This prompted me to use the Goodreads API to get a well-cleaned dataset, with the promising features only ( minus the redundant ones ), and the result is the dataset you're at now.

Columns

bookID (Integer)

title (Integer)

authors (Integer)

average_rating (Integer)

isbn (Integer)

isbn13 (Integer)

language_code (Integer)

# num_pages (Integer)

ratings_count (Integer)

text_reviews_count (Integer)