# ORSR Scraper With this application you can get all changed records in orsr for the current day. The application consists of two parts: ### 1. Scraper: - gets the data of all changed records - either the "aktuálna" or the "úplna" version - can use a socks5 proxy - stores the data in a MongoDB ### 2. Flask app: - Minimalistic flask app that has two endpoints: - /detail with parameter ico - returns a json data for the record with ico - /list - returns a paginated list of records ico and obhcodneMeno ## Setup ### 1. Prerequisites You need to have installed/access to: - current python - MongoDB - Socks5 proxy (optional) The installation of these is out of scope of this README ### 1. Download the app Download/clone the application ### 2. venv and requirements Open terminal cd to app folder and install venv ``` cd [appPath] python -m venv venv ``` install the requirements from `requirements.txt` ``` venv/bin/pip install -r requirements.txt for Windows: venv\Scripts\pip.exe install -r requirements.txt ``` ### 3. Config File There is a default config file "config_base.cfg". For local changes copy this base config file and store it as "config.cfg". The config file has the following structure: ``` [DB] MONGODB_URI = mongodb://localhost:27017 MONGODB_DB = softone MONGODB_COLLECTION = orsr [WEB] BASE_URL = https://www.orsr.sk/ ENDPOINT = hladaj_zmeny.asp [PROXY] #HTTP_PROXY = socks5://user:pass@host:port #HTTPS_PROXY = socks5://user:pass@host:port [APP] THREADS = 8 ``` Setup the connection to MongoDB, number of threads being used for collecting the data and optionally also the Socks5 Proxy params. ## Run the applications ### 1. Scraper Run the scraper with ``` venv/bin/python scraper.py for Windows: venv\Scripts\python.exe scraper.py ``` It will ask you if you want to download the "aktuálny" or "úplný" record. tqdm status bar with ThreadPool sometimes continues on newline! ### 2. Flask Start flask application ``` venv/bin/python flaskapp.py for Windows: venv\Scripts\python.exe flaskapp.py ``` Now you can get the data from the local test server that usually runs on `http://127.0.0.1:5000`