comments and README.md

This commit is contained in:
2023-09-28 17:08:50 +02:00
parent 68311135bf
commit 945b9c2195
6 changed files with 298 additions and 58 deletions

View File

@@ -0,0 +1,89 @@
# ORSR Scraper
With this application you can get all changed records in orsr for the current day.
The application consists of two parts:
### 1. Scraper:
- gets the data of all changed records
- either the "aktuálna" or the "úplna" version
- can use a socks5 proxy
- stores the data in a MongoDB
### 2. Flask app:
- Minimalistic flask app that has two endpoints:
- /detail with parameter ico
- returns a json data for the record with ico
- /list
- returns a paginated list of records ico and obhcodneMeno
## Setup
### 1. Prerequisites
You need to have installed/access to:
- current python
- MongoDB
- Socks5 proxy (optional)
The installation of these is out of scope of this README
### 1. Download the app
Download/clone the application
### 2. venv and requirements
Open terminal cd to app folder and install venv
```
cd [appPath]
python -m venv venv
```
install the requirements from `requirements.txt`
```
venv/bin/pip install -r requirements.txt
for Windows:
venv\Scripts\pip.exe install -r requirements.txt
```
### 3. Config File
There is a default config file "config_base.cfg".
For local changes copy this base config file and store it as "config.cfg". The config file has the following structure:
```
[DB]
MONGODB_URI = mongodb://localhost:27017
MONGODB_DB = softone
MONGODB_COLLECTION = orsr
[WEB]
BASE_URL = https://www.orsr.sk/
ENDPOINT = hladaj_zmeny.asp
[PROXY]
#HTTP_PROXY = socks5://user:pass@host:port
#HTTPS_PROXY = socks5://user:pass@host:port
[APP]
THREADS = 8
```
Setup the connection to MongoDB, number of threads being used for collecting the data and optionally also the Socks5 Proxy params.
## Run the applications
### 1. Scraper
Run the scraper with
```
venv/bin/python scraper.py
for Windows:
venv\Scripts\python.exe scraper.py
```
It will ask you if you want to download the "aktuálny" or "úplný" record.
### 2. Flask
Start flask application
```
venv/bin/python flaskapp.py
for Windows:
venv\Scripts\python.exe flaskapp.py
```
Now you can get the data from the local test server that usually runs on `http://127.0.0.1:5000`