The paper shows steps, made during designing and implementing automatic web pages contents recognition algorithm, based on HTML structure analysis. A web page contents is the article text with its headline, without any other text like menu, advertisements, user’s comments, image captions, etc.
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.