Analysis of formal characteristics of text in the CPACT Research: Enhancing the LIWC linguistic processing for the Czech language

  • DALIBOR KUCERA
  • JIRI HAVIGER
Keywords: Psycholinguistics, Psycho Diagnostics, Computational Linguistics, Text, Personality, CPACT

Abstract

Aim: This paper describes how psycholinguistic and psychodiagnostic fields have adopted quantitative text analysis to process spoken Czech. This method employs computer-assisted linguistic procedures to categorize and quantify formal characteristics (such as morphology, semantics, etc.) of recorded texts.
Method: The study’s sample size is 200 people who were selected using age, gender, and level of education to reflect the same proportion of representation of the target groups as is found in the total Czech population. The processes of lemmatization (the identification of a lexical unit as a dictionary entry) and unambiguity (the removal of ambiguity in interpreting a particular word or homonymy) are used in formal text analysis.
Findings: In total, CPACT studies use 212 linguistic variables, which is a substantial number. So the output is much larger than the Linguistic Processes module in the LIWC 2015 program, which processes 29 grammatical/summary variables. The linguistic variables processed by LIWC are limited, but the grammatical categories and subcategories used in the CPACT study allow for a much more in-depth exploratory study.
Implications/Novel Contribution: The results of this study provide new information on the experimental application of quantitative psycholinguistic analysis to formal parameters. It’s a fascinating strategy, and it yields many interesting hypotheses and study directions. Research into this area, whether by linguists or psychologists, has the potential to reveal surprising new insights into the makeup and dynamics of human communication.

References

Berelson, B. (1952). Content analysis in communication research. New Jersy, NJ: Free press.

Boonyarattanasoontorn, P. (2017). An investigation of Thai students English language writing difficulties and their use of writing strategies. Journal of Advanced Research in Social Sciences and Humanities, 2(2),111-118. doi:https://doi.org/10.26500/jarssh-02-2017-0205

Carley, K. (1993). Coding choices for textual analysis: A comparison of content analysis and map analysis. Sociological Methodology, 75-126. doi:https://doi.org/10.2307/271007

Cegala, D. J. (1989). A study of selected linguistic components of involvement in interaction. Western Journal of Communication, 53(3), 311-326. doi:https://doi.org/10.1080/10570318909374309

Cheng, K. H. (2011). Further linguistic markers of personality: The way we say things matters. International Journal of Psychological Studies, 3(1), 2-10. doi:https://doi.org/10.5539/ijps.v3n1p2

Chung, C., & Pennebaker, J. (2007). Social communication: Frontiers of social psychology: The psychological functions of function words. New York, NY: Psychology Press.

Eid, M. E., & Diener, E. E. (2006). Handbook of multimethod measurement in psychology. New York, NY: American Psychological Association.

Ferjencík, J. (2000). ˇ Introduction to the methodology of psychological research in research: How to examine the human soul. Novato, CA: Portál.

Furnham, A. (1986). Response bias, social desirability and dissimulation. Personality and Individual Differences, 7(3), 385-400. doi:https://doi.org/10.1016/0191 8869(86)90014-0

Hilao, M. P. (2016). Creative teaching as perceived by English language teachers in private universities. Journal of Advances in Humanities and Social Sciences, 2(5), 278-286. doi:https://doi.org/10.20474/jahss-2.5.4

Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of Research in Personality, 45(1), 92-99. doi:https://doi.org/10.1016/j.jrp.2010.11.015

Hornova, L. (2003). Reference dictionary of grammatical terms. Olomouc, Czechia: Palacky University Olomouc Publisher.

Knapp, M. L., Hart, R. P., & Dennis, H. S. (1974). An exploration of deception as a communication construct. Human Communication Research, 1(1), 15-29. doi:https://doi.org/10.1111/j.1468-2958.1974.tb00250.x

Krippendorff, K. (2018). Content analysis: An introduction to its methodology. New York, NY: Sage publications.

Kucera, D. (2017). Computational psycholinguistic analysis of Czech text and the CPACT research. In 4th International Multidisciplinary Scientific Conference on Social Sciences and Arts, Albena, Bulgaria.

Kucera, D., Hemmerová, E., & Haviger, J. (2016). Quantitative psycholinguistic analysis of formal parameters of Czech text. In Proceedings of International Scientific Council of SGEM, Sofia, Bulgaria.

Miller, G. (1995). The science of words. New York, NY: Library.

Nebeská, I. (1992). Introduction to psycholinguistics. New York, NY: H&H.

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., . . . Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of personality and social psychology, 108(6), 934-952. doi:https://doi.org/10.1037/pspp0000020

Pennebaker, J. (2003). The social, linguistic, and health consequences of emotional disclosure. In, Suls, J., (Ed.,), Social psychological foundations of health and illness. Malden, MA: Blackwell Publication.

Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of liwc 2015 (Technical report). University of Texas, Austin, TX.

Pennebaker, J. W., & Graybeal, A. (2001). Patterns of natural language use: Disclosure, personality, and social integration. Current Directions in Psychological Science, 10(3), 90-93. doi:https://doi.org/10.1111/1467-8721.00123

Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1300. doi:https://doi.org/10.1037//0022-3514.77.6.1296

Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology, 85(2), 291-300. doi:https://doi.org/10.1037/0022-3514.85.2.291

Petkevic, V. (2006). ˇ Reliable morphological disambiguation of czech: Rule-based approach is necessary. Slovakia, Bratislava: Slovak Academy of Sciences.

Pradhan, S. (2016). English language teaching: A next gate to social awareness. International Journal of Humanities, Arts and Social Sciences, 2(4), 156-158. doi:https://doi.org/10.20469/ijhss.2.20005-4

Robinson, W. P., & Giles, H. (1990). Handbook of language and social psychology. New York, NY: Wiley.

Sanford, F. H. (1942). Speech and personality. Psychological Bulletin, 39(10), 811-845.

Scherer, K. R., & Giles, H. (1979). Social markers in speech. Cambridge, UK: Cambridge University Press.

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., . . . others (2013).

Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS One, 8(9), 737-791. doi:https://doi.org/10.1371/journal.pone.0073791

Shapiro, G., & Markoff, J. (1997). A matter of definition: Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Mahwah, NJ: Erlbaum.

Sumner, C., Byers, A., & Shearing, M. (2011). Determining personality traits & privacy concerns from facebook activity. Black Hat Briefings, 11(7), 197-221.

Vazire, S. (2010). Who knows what about a person? The Self Other Knowledge Asymmetry (SOKA) model. Journal of Personality and Social Psychology, 98(2), 281-300.

Veselovská, K., Hajic, J., & Sindlerová, J. (2014). Subjectivity lexicon for Czech: Implementation and improvements. Journal for Language Technology and Computational Linguistics, 29(1), 47-61.

Weintraub, W. (1989). Verbal behavior in everyday life. New York, NY: Springer Publishing Co.

Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44(3), 363-373. doi:https://doi.org/10.1016/j.jrp.2010.04.001
Published
2019-04-22
Section
Articles