Full-text resources of PSJD and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


Preferences help
enabled [disable] Abstract
Number of results
2010 | 117 | 4 | 716-720

Article title

Linguistic Complexity: English vs. Polish, Text vs. Corpus

Content

Title variants

Languages of publication

EN

Abstracts

EN
We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.

Keywords

EN

Contributors

author
  • Institute of Nuclear Physics, Polish Academy of Sciences, Kraków, Poland
author
  • Institute of Nuclear Physics, Polish Academy of Sciences, Kraków, Poland
  • Faculty of Mathematics and Natural Sciences, University of Rzeszów, Rzeszów, Poland
author
  • Institute of Nuclear Physics, Polish Academy of Sciences, Kraków, Poland

References

  • 1. M.A. Nowak, J.B. Plotkin, V.A.A. Jansen, Nature 404, 495 (2000)
  • 2. G.K. Zipf, Human behavior and the principle of least effort, Addison-Wesley, Cambridge 1949
  • 3. B. Mandelbrot, Word 10, 27 (1954)
  • 4. R. Ferrer, R.V. Cancho, Solé, Proc. Natl. Acad. Sci. USA 100, 788 (2003)
  • 5. G.A. Miller, Amer. J. Psychol. 70, 311 (1957)
  • 6. M.A. Montemurro, Physica A 300, 567 (2001)
  • 7. J. Joyce, Ulisses, translated by M. Słomczyński, Wydawnictwo Pomorze, Bydgoszcz 1992
  • 8. The British National Corpus website: http://www.natcorp.ox.ac.uk/
  • 9. G. Leech, P. Rayson, A. Wilson, Word Frequencies in Written and Spoken English: based on the British National Corpus, Longman London 2001
  • 10. R. Ferrer Cancho, R.V. Solé, J. Quant. Linguistics 8, 165 (2001)
  • 11. P.W. Anderson, Science 177, 393 (1972)

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.bwnjournal-article-appv117n469kz
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.