Detection algorithm for content on Internet web portals

Ulman, Krzysztof; Rzecki, Krzysztof

Journal

Czasopismo Techniczne

- |

Article title

Detection algorithm for content on Internet web portals

Authors

Krzysztof Ulman , Krzysztof Rzecki

Content

Full texts:

Download

Title variants

Languages of publication

PL

Abstracts

PL

The paper shows steps, made during designing and implementing automatic web pages contents recognition algorithm, based on HTML structure analysis. A web page contents is the article text with its headline, without any other text like menu, advertisements, user’s comments, image captions, etc.

Keywords

PL

web pages contents recognition, data mining, web scraping, data collection, web pages structure analysis, HTML

Publisher

[unknown2]

Journal

Czasopismo Techniczne

Year

-

Physical description

Dates

online

2015-05-07

Contributors

author

Krzysztof Ulman

author

Krzysztof Rzecki

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.ojs-nameId-6e3e8ea9-5a94-37a7-828b-e6cd5da23db6-year-2015-article-2093