Lately, I have been doing some research on the Linguistic perspective on writing quality: what frameworks/theories others have used in research, what their methodologies are, how they define “quality”, etc. I found quite a lot of articles, but after reading through them, I realized that they are all pure Computational Linguistics, both in their theoretical frameworks and their methodologies. Most of the recent ones are trying to solve the problem of having the computer determine if their NLG output is “good” or not. (For example: Are automated summaries coherent?)
Almost all of the articles I found equate quality with coherence/cohesion. The articles will sometimes give a passing nod to Halliday and Hasan (1976), but not much more than that. Instead, they seem to focus on theories in the Computational Linguistics research such as Centering Theory (Grosz et al., 1983), or the “theory of attention, intention, and aggregation of utterances” (Grosz and Sidner, 1986) or Rhetorical Structure Theory (Mann and Thompson, 1988). Or they base it on cognitive psychology work, such as “Coherence in text, coherence in mind” — a book by Givón (1993).
The methodologies of the studies I have been reading are all using a lot of formulas and Hidden Markov Models trying to find a model of language that fits the data and which correlates with some human judgement of quality. I am not sure how far I will be going down that path, but out of all of it, the Rhetorical Structure Theory looks the most interesting and might be applicable to my research as an analysis tool. It’s definitely the most popular framework for the articles I have seen.
Unfortunately, my research purpose and rationale is not as focused as I would like it to be at this point. I was hoping to narrow it down sooner rather than later. But maybe I should just gather my data and pick a topic (or at least a linguistic level) and dive in and see what happens.
References
Givón, T. (1993). Coherence in text, coherence in mind. Pragmatics & Cognition, 1(2), 171-227.
Chicago
Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175-204
Grosz, B. J., Joshi, A. K., & Weinstein, S. (1983). Providing a unified account of definite noun phrases in discourse. In Proceedings of the 21st annual meeting on association for computational linguistics (pp. 44-50)
Halliday, M. A., & Hasan, R. (1976). Cohesion in English.
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243-281.
A Sample of Key (Highly-Cited) Computational Linguistics journal articles about Cohesion/Coherence and/or Writing/Text Quality
Barzilay, R., & Lapata, M. (2008). Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1), 1-34
Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. Springer
Crossley, S. A., & McNamara, D. S. (2011). Text coherence and judgments of essay quality: Models of quality and coherence. In Proceedings of the 29th annual conference of the cognitive science society (pp. 1236-1241)
Elsner, M., Austerweil, J. L., & Charniak, E. (2007). A unified local and global model for discourse coherence. In HLT-NAACL (pp. 436-443)
Gordon, P. C., Grosz, B. J., & Gilliom, L. A. (1993). Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17(3), 311-347
Lapata, M., & Barzilay, R. (2005). Automatic evaluation of text coherence: Models and representations. In IJCAI (Vol. 5, pp. 1085-1090)
Lin, Z., Ng, H. T., & Kan, M. -Y. (2011). Automatically evaluating text coherence using discourse relations. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 997-1006)
Louis, A., & Nenkova, A. (2012). A coherence model based on syntactic patterns.
Louis, A., & Nenkova, A. (2013). A corpus of science journalism for analyzing writing quality. Dialogue & Discourse, 4(2), 87-117
Pitler, E., & Nenkova, A. (2008). Revisiting readability: A unified framework for predicting text quality. In Proceedings of the conference on empirical methods in natural language processing (pp. 186-195)
Pitler, E., Louis, A., & Nenkova, A. (2010). Automatic evaluation of linguistic quality in multi-document summarization. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 544-554)
Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 conference of the north american chapter of the association for computational linguistics on human language technology-volume 1 (pp. 149-156).