parsers module

parsers.parse_xml(x)

parse xml

Parameters:path – [String] Path to an xml file
Returns:object of class lxml.etree._Element

Usage:

from pyminer import fetch
from pyminer import parsers
url = "https://peerj.com/articles/cs-23.xml"
out = fetch(url)
parsers.parse_xml(out.path)
parsers.parse_xml_string(x, encoding='UTF-8')

parse xml to a string

Parameters:path – [String] Path to an xml file
Returns:a string

Usage:

from pyminer import fetch
from pyminer import parsers
url = "https://peerj.com/articles/cs-23.xml"
out = fetch(url)
parsers.parse_xml_string(out.path)
parsers.parse_plain(x)

parse plain text

Parameters:path – [String] Path to a plain text file
Returns:a string

Usage:

from pyminer import fetch
from pyminer import parsers
url = "xx"
out = fetch(url)
parsers.parse_plain(out.path)
parsers.parse_pdf(x)

parse pdf

Parameters:path – [String] Path to a pdf file
Returns:a string

Usage:

from pyminer import fetch
from pyminer import parsers
url = "http://www.banglajol.info/index.php/AJMBR/article/viewFile/25509/17126"
out = fetch(url)
parsers.parse_pdf(out.path)