Miner

Miner is generic engine and framework for processing and analyzing data written in python

Download as .zip Download as .tar.gz View on GitHub Wiki

Framework

Miner as a framework allows to access different data sources: log files, relational and NoSQL databases or just simple csv files from the single shell. You can analyze data, export it or transfer from one data source to another. Miner can be easily extended to support new data sources or proprietary log formats. It provides convenient (for some) console interface with context based autocompletion.

Engine

Miner as an engine introduces simple yet powerful query language for data processing (you will see some examples below). This query language alongside the data mining instructions makes use of standard python expressions and is not limited to some small function subset like SQL. This gives user maximum freedom and allows integration with 3rd party modules.

Applications

Miner can be used for wide variety of tasks like analyzing logs of web servers, financial data, post processing results of database queries and even analyzing network captures. It can be also used to perform simulations and tests.

Query Examples

Consider you have log from http server in csv format with something like following:

path,time
/index.html,10
/login,1000
/heavy_script?param=1,10000

With miner you can easily find user requests that took maximum time:

READ log.csv | TOP 5 time | STDOUT

The output will be something like:

path                   time
----------------------------
/heavy_script?param=5  50000
/heavy_script?param=4  40000
/heavy_script?param=3  30000
/heavy_script?param=2  20000
/heavy_script?param=1  10000

To extract number of request and total time consumed by each web resource one would execute following miner query:

READ log.csv | SELECT path.split('?',1)[0] as resource, time | FOR DISTINCT resource SELECT count(True) as numRequests, sum(time) | STDOUT
resource       numRequests     time
------------------------------------
/heavy_script  10              100000
/index.html    5               50
/login         5               5000

And many many other interesting facts that you can learn without writing single line of code.

You can easily export the data to excel file by running:

USE excel_xlsxw
READ log.csv | ... | WRITE export.xlsx

Even adding charts

READ log.csv | ... | WRITE chartType="column" chartX="numRequests" chartY="numRequests" export.xlsx

Miner provides the possibility to analyze plain data without need to insert it to database or pre-process it in another way. But it also allows to process data stored in databases using full power of python and generate reports in different formats.

DB connection FETCH 'SELECT number FROM table' | SELECT mymodule.factorial(number) as factorial | WRITE report.csv

Miner command above performs SQL query on database, and then applies function factorial from python module mymodule saving result to report.csv file.

Example below shows how to transfer data from ona database to another

DB con1 FETCH 'SELECT number FROM table' |
 SELECT number, mymodule.factorial(number) as factorial |
 DB con2 PUSH 'insert into factorials(num,fact) values(%s,%s)' WITH number, factorial

Supported Data Sources

Miner supports many data sources out of the box, other are supported by miner extensions tools. Following are currently supported:

Windows Utilities

There are many powerful utilities avavailable on Unix systems. There are different options to get them running on windows platform, for example via cygwin framework. We chose a different option by implementing some of them in pure python for needs of Miner, our needs and for your pleasure. The python port has large advantage over other implementations: it has single dependency - python interpreter. Visit GnuPy home page for more details and download


Mail Us@minersoft