ORSR Scraper
With this application you can get all changed records in orsr for the current day.
The application consists of two parts:
1. Scraper:
- gets the data of all changed records
- either the "aktuálna" or the "úplna" version
- can use a socks5 proxy
- stores the data in a MongoDB
2. Flask app:
- Minimalistic flask app that has two endpoints:
- /detail with parameter ico
- returns a json data for the record with ico
- /list
- returns a paginated list of records ico and obhcodneMeno
- /detail with parameter ico
Setup
1. Prerequisites
You need to have installed/access to:
- current python
- MongoDB
- Socks5 proxy (optional)
The installation of these is out of scope of this README
1. Download the app
Download/clone the application
2. venv and requirements
Open terminal cd to app folder and install venv
cd [appPath]
python -m venv venv
install the requirements from requirements.txt
venv/bin/pip install -r requirements.txt
for Windows:
venv\Scripts\pip.exe install -r requirements.txt
3. Config File
There is a default config file "config_base.cfg". For local changes copy this base config file and store it as "config.cfg". The config file has the following structure:
[DB]
MONGODB_URI = mongodb://localhost:27017
MONGODB_DB = softone
MONGODB_COLLECTION = orsr
[WEB]
BASE_URL = https://www.orsr.sk/
ENDPOINT = hladaj_zmeny.asp
[PROXY]
#HTTP_PROXY = socks5://user:pass@host:port
#HTTPS_PROXY = socks5://user:pass@host:port
[APP]
THREADS = 8
Setup the connection to MongoDB, number of threads being used for collecting the data and optionally also the Socks5 Proxy params.
Run the applications
1. Scraper
Run the scraper with
venv/bin/python scraper.py
for Windows:
venv\Scripts\python.exe scraper.py
It will ask you if you want to download the "aktuálny" or "úplný" record.
2. Flask
Start flask application
venv/bin/python flaskapp.py
for Windows:
venv\Scripts\python.exe flaskapp.py
Now you can get the data from the local test server that usually runs on http://127.0.0.1:5000