2023-09-28 18:08:45 +02:00
2023-09-28 17:08:50 +02:00
2023-09-28 11:59:39 +02:00
2023-09-28 17:08:50 +02:00
2023-09-28 18:08:45 +02:00
2023-09-27 13:02:30 +02:00
2023-09-28 17:08:50 +02:00

ORSR Scraper

With this application you can get all changed records in orsr for the current day.

The application consists of two parts:

1. Scraper:

  • gets the data of all changed records
    • either the "aktuálna" or the "úplna" version
    • can use a socks5 proxy
  • stores the data in a MongoDB

2. Flask app:

  • Minimalistic flask app that has two endpoints:
    • /detail with parameter ico
      • returns a json data for the record with ico
    • /list
      • returns a paginated list of records ico and obhcodneMeno

Setup

1. Prerequisites

You need to have installed/access to:

  • current python
  • MongoDB
  • Socks5 proxy (optional)

The installation of these is out of scope of this README

1. Download the app

Download/clone the application

2. venv and requirements

Open terminal cd to app folder and install venv

cd [appPath]
python -m venv venv

install the requirements from requirements.txt

venv/bin/pip install -r requirements.txt

for Windows:
venv\Scripts\pip.exe install -r requirements.txt

3. Config File

There is a default config file "config_base.cfg". For local changes copy this base config file and store it as "config.cfg". The config file has the following structure:

[DB]
MONGODB_URI = mongodb://localhost:27017
MONGODB_DB = softone
MONGODB_COLLECTION = orsr

[WEB]
BASE_URL = https://www.orsr.sk/
ENDPOINT = hladaj_zmeny.asp

[PROXY]
#HTTP_PROXY = socks5://user:pass@host:port
#HTTPS_PROXY = socks5://user:pass@host:port

[APP]
THREADS = 8

Setup the connection to MongoDB, number of threads being used for collecting the data and optionally also the Socks5 Proxy params.

Run the applications

1. Scraper

Run the scraper with

venv/bin/python scraper.py

for Windows:
venv\Scripts\python.exe scraper.py

It will ask you if you want to download the "aktuálny" or "úplný" record.

tqdm status bar with ThreadPool sometimes continues on newline!

2. Flask

Start flask application

venv/bin/python flaskapp.py

for Windows:
venv\Scripts\python.exe flaskapp.py

Now you can get the data from the local test server that usually runs on http://127.0.0.1:5000

Description
No description provided
Readme 68 KiB
Languages
Python 100%