Miner as a framework allows to access different data sources: log files, relational and NoSQL databases or just simple csv files from the single shell. You can analyze data, export it or transfer from one data source to another. Miner can be easily extended to support new data sources or proprietary log formats. It provides convenient (for some) console interface with context based autocompletion.
Miner as an engine introduces simple yet powerful query language for data processing (you will see some examples below). This query language alongside the data mining instructions makes use of standard python expressions and is not limited to some small function subset like SQL. This gives user maximum freedom and allows integration with 3rd party modules.
Miner can be used for wide variety of tasks like analyzing logs of web servers, financial data, post processing results of database queries and even analyzing network captures. It can be also used to perform simulations and tests.
Consider you have log from http server in csv format with something like following:
path,time /index.html,10 /login,1000 /heavy_script?param=1,10000
With miner you can easily find user requests that took maximum time:
READ log.csv | TOP 5 time | STDOUT
The output will be something like:
path time ---------------------------- /heavy_script?param=5 50000 /heavy_script?param=4 40000 /heavy_script?param=3 30000 /heavy_script?param=2 20000 /heavy_script?param=1 10000
To extract number of request and total time consumed by each web resource one would execute following miner query:
READ log.csv | SELECT path.split('?',1) as resource, time | FOR DISTINCT resource SELECT count(True) as numRequests, sum(time) | STDOUT resource numRequests time ------------------------------------ /heavy_script 10 100000 /index.html 5 50 /login 5 5000
And many many other interesting facts that you can learn without writing single line of code.
You can easily export the data to excel file by running:
USE excel_xlsxw READ log.csv | ... | WRITE export.xlsx
Even adding charts
READ log.csv | ... | WRITE chartType="column" chartX="numRequests" chartY="numRequests" export.xlsx
Miner provides the possibility to analyze plain data without need to insert it to database or pre-process it in another way. But it also allows to process data stored in databases using full power of python and generate reports in different formats.
DB connection FETCH 'SELECT number FROM table' | SELECT mymodule.factorial(number) as factorial | WRITE report.csv
Miner command above performs SQL query on database, and then applies function factorial from python module mymodule saving result to report.csv file.
Example below shows how to transfer data from ona database to another
DB con1 FETCH 'SELECT number FROM table' | SELECT number, mymodule.factorial(number) as factorial | DB con2 PUSH 'insert into factorials(num,fact) values(%s,%s)' WITH number, factorial
Supported Data Sources
Miner supports many data sources out of the box, other are supported by miner extensions tools. Following are currently supported:
- plain log files, tsv
- csv files
- json files
- excel files for reports generation
- ncsa (common log format) and customized apache logs
- sqlite database files
- mysql database
- pcap - network captures
There are many powerful utilities avavailable on Unix systems. There are different options to get them running on windows platform, for example via cygwin framework. We chose a different option by implementing some of them in pure python for needs of Miner, our needs and for your pleasure. The python port has large advantage over other implementations: it has single dependency - python interpreter. Visit GnuPy home page for more details and download