update_prices.ipynb

This notebook guides through the action of updating the prices of a database, i.e. adding records in the price table with the prices registered on the current day.

The notebook is divided in two parts:

  1. In the first part we create a minimal database with a single book. We save it as database.db in the current folder.

  2. In the second part we load the database database.db and we read it as a polars.DataFrame. We then read all the rows of the database and scrape the prices for any link we find. The prices are then added to the table prices in the database with the current date.

from datetime import datetime

import bookhoarder as bh
%opts magic unavailable (pyparsing cannot be imported)
%compositor magic unavailable (pyparsing cannot be imported)

We load the database database.db in the current folder, which is read as a polars.DataFrame with a specific schema (specified by mbp.schema())

books = bh.read_database(
    table_name="books",
    url="sqlite:///database.db",
    schema=bh.schema(),
)
books
shape: (2, 21)
isbnauthortitleyearpublisherlanguageformatfull_pricemin_priceadelphibuecherfeltrinellihoeplihugendubelibslibcooplibracciolibunimondadoriosianderrizzoli
strstrstrstrstrstrstrf64f64strstrstrstrstrstrstrstrstrstrstrstr
"9783866473256""Karl Marx""Das Kapital""2009""Anaconda"nullnull7.957.95null"https://www.buecher.de/artikel…nullnullnullnullnullnullnullnull"https://www.osiander.de/shop/h…null
"9781857988826""Ursula K. Le Guin""The Dispossessed""1999""Orion Publishing Co"nullnull13.013.0null"https://www.buecher.de/artikel…"https://www.lafeltrinelli.it/d…nullnull"https://www.ibs.it/dispossesse…nullnullnullnull"https://www.osiander.de/shop/h…null

As a first step, we define empty lists, we store today’s date, and the book’s isbn.

list_isbns = []
list_sites = []
list_prices = []
list_dates = []
date = datetime.today().date()

Notice that we need to keep in mind that for each book we do not have links for all webistes, for example we only have the following websites for the first book:

[s.name for s in books[0][bh.list_sites()] if not s.null_count()]
['buecher', 'osiander']

Then, the following function extracts the following lists:

list_isbn

list_sites

isbn_1

link_1 for book_1

isbn_1

link_2 for book_1

isbn_1

link_3 for book_1

isbn_2

link_1 for book_2

isbn_2

link_2 for book_2

etc.

prices = bh.scrape_database(books)
prices
100%|██████████| 6/6 [00:04<00:00,  1.22it/s]
shape: (6, 4)
isbnsitepricedate
strstrf64date
"9783866473256""buecher"7.952025-06-12
"9783866473256""osiander"7.952025-06-12
"9781857988826""buecher"9.992025-06-12
"9781857988826""feltrinelli"13.742025-06-12
"9781857988826""ibs"13.052025-06-12
"9781857988826""osiander"10.992025-06-12

We create an updated_database.db to store the updated values.

bh.write_database(
    books,
    table_name="books",
    url="sqlite:///updated_database.db",
    if_table_exists="replace",
)
bh.write_database(
    prices,
    table_name="prices",
    url="sqlite:///updated_database.db",
    if_table_exists="replace",
)

After scraping the prices, we update the minimum (min_price) in the books table.

bh.update_min_whole_database(url="sqlite:///updated_database.db")
bh.read_database(
    table_name="books",
    url="sqlite:///updated_database.db",
    schema=bh.schema(),
)
shape: (2, 21)
isbnauthortitleyearpublisherlanguageformatfull_pricemin_priceadelphibuecherfeltrinellihoeplihugendubelibslibcooplibracciolibunimondadoriosianderrizzoli
strstrstrstrstrstrstrf64f64strstrstrstrstrstrstrstrstrstrstrstr
"9783866473256""Karl Marx""Das Kapital""2009""Anaconda"nullnull7.957.95null"https://www.buecher.de/artikel…nullnullnullnullnullnullnullnull"https://www.osiander.de/shop/h…null
"9781857988826""Ursula K. Le Guin""The Dispossessed""1999""Orion Publishing Co"nullnull13.09.99null"https://www.buecher.de/artikel…"https://www.lafeltrinelli.it/d…nullnull"https://www.ibs.it/dispossesse…nullnullnullnull"https://www.osiander.de/shop/h…null