Syntactic Complexity and Lexical Diversity in English Conference Abstracts: Investigating Cross-Disciplinary Effects with Native Speaker Baseline

نوع المستند : المقالة الأصلية

المؤلف

Associate Professor, Helwan University

المستخلص

This study investigates potential cross-disciplinary effects on the extent of syntactic complexity and lexical diversity (SC & LD, respectively) in English conference abstracts authored by Egyptian (Arabic L1) researchers in two disciplines: Linguistics and Nuclear Science. The study establishes a native speaker baseline through parallel analysis of British-authored abstracts in the two disciplines under investigation. The data comprises 100 single-authored English conference abstracts, evenly divided over four contrastive categories: Eg(yptian)-Ling(uistics), Eg-N(uclear) Sc(ience), Br(itish)-Ling, and Br-NSc. Using two computational tools, L2 Syntactic Complexity Analyzer (L2SCA) and TEXT INSPECTOR, scores of SC and LD, respectively, have been extracted and managed in MS Excel through some statistical tools. The results have indicated significant and uniform cross-disciplinary effects in both the native and non-native groups in terms of SC, where the Ling abstracts have displayed longer and more complex production units. Furthermore, significant language nativity effects have been observed in terms of SC; English natives have been found to use more subordination, which is characteristic of more mature writing in their L1. Arabic natives, on the other hand, have made greater use of coordination which is the preferred structure-combining operation in their L1. In terms of LD, the native groups have outperformed the non-native groups across both disciplines.

الكلمات الرئيسية


1.      Introduction

Linguistic complexity may be viewed as “a dynamic property of the learner’s L2 system at large …the degree of elaboration, the size, breadth, width, or richness of the learner’s L2 system or ‘repertoire’, that is, … the number, range, variety or diversity of different structures and items that he [the learner] knows or uses” (Bulté & Housen[i], 2012, p. 25), as “[t]he extent to which language produced in performing a task is elaborate and varied” (Ellis, 2003, p. 340; as cited in Lu & Ai, 2015, p. 17), or as “the range and the sophistication of grammatical resources exhibited in language production” (Ortega, 2015, p. 82).

Linguistic complexity, however, is not the sole measure of L2 learner’s performance; two other performance descriptors, accuracy and fluency, contribute to the Complexity-Accuracy-Fluency (CAF) triad of the L2 proficiency model proposed by Skehan (1989) “for the oral and written assessment of language learners as well as indicators of learners’ proficiency underlying their performance” (as cited in Housen & Kuiken, 2009, p. 461). However, of the three components of the CAF, complexity has received the most attention in L2 research (Ansarifar, Shahriari, & Pishghadam, 2018).

Linguistic complexity has been extensively researched and recognized in L2 research as a multidimensional construct operationalized through a wide range of mostly automatic measures (Lu, 2010; Crossley & McNamara, 2012, 2014; Kalantari & Gholami, 2017) targeting different language domains: lexical, morphological, syntactic/grammatical, and phonological (Bulté & Housen, 2012). In the literature, most studies have focused on the lexical and/or syntactic domains in the written production of L2 learners of different levels of proficiency (Wang & Slater, 2016), of different L1 backgrounds (Lu & Ai, 2015), of different levels of pragmatic competence (Youn, 2014), reading texts of varying degrees of complexity (Douglas & Miller, 2016), writing on different topics (Yang, Lu, & Weigle, 2015), and targeting different L2s (Bulon, Hendrikx, Meunier, & Van Goethem, 2017).

The genre that received most attention in the literature was learners’ (argumentative) essays; very few studies have investigated linguistic complexity in research abstracts (Ansarifar et al., 2018). Furthermore, very few studies have attempted stronger validation of their results by comparing L2 performance to that of native speakers (Foster & Tavakoli, 2009; Lu & Ai, 2015). In fact, this is one of the significant aspects about the present study as explained in more detail in Section 2 (Research Statement and Hypotheses), Section 3 (Previous Studies), and Section 7 (Analysis). With regard to the tools of analysis, the present study has employed the L2 Syntactic Complexity Analyzer (L2SCA) and TextInpector for the syntactic and lexical domains, respectively (Sections 4 and 5). Statistical analysis has been carried out in MS Excel (Section 6, Data and Methodology). Notably, before carrying out the analysis, all abstracts have been screened for obvious grammaticality issues through MS Word grammar checker as shown in Section 7.

2.      Research Statement and Hypotheses

The present study investigates potential cross-disciplinary (Linguistics vs. Nuclear Science) effects on both syntactic complexity and lexical diversity (SC & LD, respectively) in one of the under-researched genres, namely, research abstracts. The study establishes a native-speaker (British) baseline against which non-native (Egyptian) performance is compared. 100 recently published conference abstracts are evenly divided over four contrastive categories: Eg(yptian)-Ling(uistics), Eg-N(uclear) Sc(ience), Br(itish)-Ling, and Br-NSc. Cross-group comparisons have been carried out to test three initial hypotheses illustrated in Table 1 below:

Table 1: The three initial hypotheses of the present research

H1

Disciplinary Effects: In both native and non-native groups, linguistics authors, whose discipline entails greater exposure to linguistic scrutiny, would display significantly higher levels of both SC and LD in their writing than their nuclear science counterparts.

H2

Language Nativity Effects: In each discipline, the native authors would outperform the non-native authors in terms of SC and LD.

H3

Correlation between syntactic complexity and lexical diversity scores: Since both are reflections of grammatical richness, it is expected that scores for SC and LD would correlate across all groups.

The first two hypotheses predict greater complexity by certain respective groups, but neither of them specifies which, if any, of the measures of SC or LD would be affected. The third hypothesis, which predicts no difference between the syntactic and lexical domains in the complexity profile across all groups, is supported by an earlier study by Douglas & Miller (2016; see Section 3). The results of testing these hypotheses are presented in Section 7 (Analysis) and discussed in Section 8 (Discussion).

 

3.      Previous Studies on Syntactic and Lexical Complexity

Syntactic and lexical complexity have been investigated either combined (e.g. Douglas & Miller, 2016; Bulon et al., 2017) or independently (e.g. Lu, 2010, 2011 and 2017; Ortega, 2015 for SC; and Salazar, 2011; Kyle & Crossley, 2015 for LC) in L2 production, with more focus on writing rather than on speaking (Biber & Gray, 2010; Chen & Zechner, 2011).

When combined, syntactic and lexical complexity were often found to correlate with one another as reflections of grammatical richness and/or development in L2 writing (Douglas & Miller, 2016). Consistent syntactic and lexical short (Storch & Tapper, 2009; Bulté & Housen, 2014; Schenker, 2016) and long-term (Bulon et al., 2017) gains in L2 writing proficiency were reported. Storch and Tapper (2009), for example, found that upon completion of an English for Academic Purposes (EAP) course in an Australian university, graduate students’ writings showed improvement in terms of accuracy, use of academic vocabulary, and structure. On the other hand, lack of significant improvement in both lexical and grammatical complexity (despite improvement in fluency) has been reported by Knoch, Rouhshad, Oon, & Storch (2015) in international students’ writing  after three years of study in an English-medium Australian university. Commenting on the nature of their immersion experience, the students in Knoch et al.’s study reported in the interviews that “they were not required to do much writing in their degree studies and when they did, their lecturers almost exclusively commented on the content of their writing” (Knoch et al., 2015, p. 39). This suggests that the development of grammatical richness in learners’ writing requires conscious attention and practice. Differential findings on the correlation between syntactic and lexical complexity have been reported by Xudong, Cheng, Varaprasad, & Leng (2010) who investigated the impact of an EAP course on the development of academic writing abilities of ESL/EFL graduate students at the National University of Singapore. They found that while students’ post-course essays contained more academic vocabulary, they did not progress in terms of grammatical accuracy and fluency. The findings concerning language complexity development remain inconclusive perhaps due to the multi-factorial nature of this process.

As a dependent variable, sensitivity of syntactic and/or lexical complexity in L2 writing has been explored in relation to L1 background (Lu & Ai, 2015), the specific L2 being learned (Bulon et al., 2017), level of pragmatic competence (Youn, 2014), complexity of texts that learners most frequently read (Douglas & Miller, 2016), writing topic and writing quality (Yang et al., 2015), and cumulative experience (Ansarifar et al.,  2018). Lu and Ai (2015) contrasted all 14 measures in the L2SCA (see Section 4) in 1400 EFL argumentative essays evenly representing seven L1s from four different language families: Sino-Tibetan (Chinese), Japonic (Japanese), Niger-Congo (Tswana), and Indo-European (Bulgarian, English, French, German, Russian)) extracted from the International Corpus of Learner English. They found that, when all learner groups were collapsed into one NN group, only 3 of 14 measures of syntactic complexity showed difference from the (N)ative group. Yet, when the NN groups were disaggregated by L1 and compared against the N group, all 14 measures exhibited differences. They concluded that learners with different L1 backgrounds may not develop in the same ways in all dimensions of SC. Bulon et al. (2017) reported L2-sensitive impact of Content and Language Integrated Learning (CLIL) education on L2 proficiency of secondary school French-speaking Belgian pupils learning English and Dutch. Nearly all complexity measures significantly improved in the Dutch texts written by CLIL pupils while only half of such measures showed significant development in the English texts. Douglas and Miller (2016) reported strong correlation between syntactic and lexical complexity of 65 graduate students’ most frequently leisurely-read texts on the one hand and of those students’ writing on the other. Yang et al. (2015) investigated topic effect in 100 ESL argumentative essays by graduate students with non-homogenous L1s: Chinese, Korean, and Japanese. They found that one of the topics (their future plans), which, according to Yang et al. (2015) would naturally demand causal reasoning in task performance, invited a higher amount of subordination (finite and non-finite) and greater global sentence complexity measured via mean length of sentence (MLS; see Section 4). On the other hand, the other topic (appearance) elicited more elaboration at the finite clause level as reflected in more coordinate phrases and complex noun phrases. Furthermore, Yang et al. investigated the relation between SC and the writing quality as judged by human raters using the TOEFL iBT Independent writing scoring guide. Higher scores significantly correlated with global sentence complexity and T-unit complexity measured via mean length of T-unit (MLTU; see Section 4). Ansarifar et al. (2018) compared research abstracts by MA-level L1 Persian writers, PhD-level L1 Persian writers, and published writers from the field of applied linguistics in terms of phrasal modification features. They found that while the (less experienced) MA-level writers significantly differed from the expert writers, the (more experienced) PhD-level did not. Ansarifar et al. concluded that academic writing became more complex with experience.

Few studies in L2 performance have paid attention to establishing a native-speaker baseline for interpreting nonnative-speaker performance (Foster & Tavakoli, 2009; Lu & Ai, 2015 (discussed above)). Foster & Tavakoli (2009) subjected the performance of both native and non-native speakers to equal scrutiny because they believed that “[i]f we investigate how learners perform language tasks, we should distinguish what performance features are due to their processing an L2 and which are due to their performing a particular task” (p. 866). They studied the writing of 100 learners of English: 40 learners from a variety of L1 backgrounds who were based in London (i.e. within the target language community) and 60 native speakers of Persian who were based in Tehran (i.e. outside the target language community). Foster & Tavakoli explored the influence of two narrative design features – namely, complexity of storyline and tightness of narrative structure – on the complexity, fluency, accuracy, and lexical diversity in the language of both native and non-native speakers. They found that storyline complexity correlated with more subordinate (i.e. complex) language by both native and nonnative speakers. With regard to narrative structure, on the other hand, a tight structure (as opposed to a loose structure) correlated with almost equally higher lexical diversity in the writing of both the native speakers and the London-based learners. The Tehran-based learners were lagging behind.

Benchmarking native/proficient speakers’ performance has further enabled Wang and Slater (2016) to reveal the specific grammatical features where L2 performance diverged. Wang and Slater contrasted 38 written personal statements by Chinese non-English major college students with 15 personal statements by English proficient users (probably native speakers of English) extracted from the websites of a number of Canadian and American universities. The results indicated that Chinese EFL students used significantly fewer complex nominals, and shorter clauses and sentences than did the more proficient users (or rather, native speakers) of English.

In terms of data, most studies on complexity in written production have targeted academic (often argumentative) essays produced by L2 learners of different levels of proficiency (Lu, 2010, 2011 and 2017; Lu and Ai, 2015; Yang, Lu, &Weigle, 2015). Very few studies have targeted other genres like cover letters for job applications written by adult graduate students (Douglas and Miller, 2016), e-mail exchanges (Schenker, 2016), personal statements by students as they enter college (Wang and Slater, 2016), and research abstracts (Ansarifar et al., 2018).

In the present study, written research abstracts by both native and nonnative speakers across two disciplines (Linguistics and Nuclear Science) have been subjected to equal scrutiny. If L1 performance is shown to be influenced by disciplinary effects in the same way as L2 performance is, then this would give stronger validity to disciplinary effects.

4.      Measuring Syntactic Complexity

Traditionally, SC has been linked to clausal rather than to phrasal complexity (Diessel, 2004 and Ravid & Berman, 2010 as cited in Yang et al., 2015; Givo´n, 2009). However, several L1 and L2 developmental studies have taken phrasal complexity, especially that of noun phrases (NPs) as a significant indicator of syntactic complexity (see Ansarifar, et al., 2018) in terms of the complexity of pre- and post- phrasal modification).

In the literature, SC indicators range from simple measures such as the number of words before the main verb, where simple sentences like “She laughs” or “The girl left” with only one and two words before the main verb, respectively, would be contrasted with a sentence like “Thus, in syntactically simple English sentences there are few words before the main verb,” with seven words before the main verb (McNamara, Crossley, & McCarthy, 2010, p. 69), to more elaborate sets of measures that target both clausal and phrasal complexity in terms of length of production units at all syntactic levels (phrase, clause, sentence, and T-unit[ii]), ratios of certain syntactic structures as well as the amount of clausal coordination and subordination (Wolfe-Quintero, Inagaki, & Kim, 1998; Bulte´ & Housen, 2012, 2014, 2015; Ortega, 2015;). Bulte´ & Housen (2014), for example, have compiled a set of measures targeting sentential complexity via mean length of sentence (MLS), mean length of T-unit (MLTU), sentence types (simple, compound, complex), and coordination; clausal complexity via mean length of clause (MLC); and phrasal complexity through mean length of noun phrase (MLNP). Interestingly, Bulte´ & Housen (2014) refrained from using automatic computer-assisted tools like Coh-Metrix (see Note iii) and the L2 Syntactis Complexity Analyzer (L2SCA; this section) claiming that they created ‘computational noise,’ and instead segmented the sentences in their data manually and managed them in MS Excel sheets. Martinez (2018), on the other hand, adopted B&H’s compiled set of measures and used Coh-Metrix, which, in its public versions, analyzes texts on over 200 measures of cohesion, language, and readability.

In the present study, the L2SCA,which has been described as providing “the most granular and comprehensive identification of writing samples” (Douglas and Miller, 2016, p. 4), is used. It is a web-based tool that was developed by Lu (2010, 2011) at Pennsylvania State University based on an extensive review of the literature on SC. This software analyzes the data using Stanford Parser and Treegex, producing results for 14 syntactic complexity indicators, including length and density of several syntactic structures, as well as the amount of coordination and subordination (see Table 2). L2SCA is open to public use in its single mode at https://aihaiyang.com/software/l2sca/single/. Single mode enables concurrent independent processing of up to two texts, with a maximum of 1000 word each.

Table 2: Syntactic Complexity Measures

Length of

production unit

  1.  

MLC

Mean length of clause

  1.  

MLS

Mean length of sentence

  1.  

MLT

Mean length of T-unit

Amount of subordination

  1.  

C/T

Number of clauses per T-unit

  1.  

CT/T

Complex T-unit ratio

  1.  

DC/C

Number of dependent clauses per clause

  1.  

DC/T

Number of dependent clauses per T-unit

Amount of coordination

  1.  

CP/C

Number of coordinate phrases per clause

  1.  

CP/T

Number of coordinate phrases per T-unit

  1.  

T/S

Number of T-units per sentence

Amount of phrasal sophistication

  1.  

CN/C

Number of complex nominals per clause

  1.  

CN/T

Number of complex nominals per T-unit

  1.  

VP/T

Number of verb phrases per T-unit

Overall sentence complexity

  1.  

C/S

Number of clauses per sentence

(Lu, 2017, p. 503)

The L2SCA with its 14 measures was used by Lu and Ai (2015), Wang and Slater (2016), and with minor adaptations by Yang et al. (2015) who selectively computed six measures—MLS, MLTU, MLC, TU/S, DC/TU, and CP/C—with the original version of the L2SCA. Yet, with regard to NP complexity, they claimed to have modified the pattern used in the L2SCA to identify complex NPs with a more inclusive characterization based on Biber, Gray, & Poonpon (2011) as noun phrases that contain one or more of the following: pre-modifying adjectives, post-modifying prepositional phrases, and post-modifying appositives. In addition, Yang et al. (2015) aimed to calculate a new SC index, namely, non-finite elements per clause (NFE/C) through subtracting 1 from the measure of verb phrases per clause (VP/C) since a clause should contain one finite VP, and hence the other VPs would be non-finite.

5.      Measuring Lexical Diversity

Lexical diversity (LD) which counts how many different words are used in a text is one of two popular parameters of lexical complexity. The other is lexical density which gives the ratio of content words (i.e. nouns, verbs, adjectives and some adverbs) in a text. Both parameters have been used to characterize later lexical development in written production as argued by Johansson (2008) who compared these two parameters and concluded that they could be used interchangeably.

The traditional lexical density measure is the type-token ratio (TTR) calculated as the ratio of different words (types) to the total number of words (tokens). It follows that TTR would be at its maximum value when the number of word types is equal to the total number of words (tokens), i.e. when every word type occurs just once across the whole text. In that case, the text is either very low in cohesion or very short. Hence, a shorter text would generally have a higher TTR value than a longer text. The conclusion is that TTR is substantially affected by text length, which would render it reliable only when comparing (longer) texts of equal length (Johnsson, 2008; Koizumi &In’nami, 2012). It follows that, since abstracts in general are fairly short texts, and since the abstracts under study display considerable variation in mean length (indicated by high SD values in Table 3 and statistically significant difference in mean length in Table 4, Section 6), TTR would be unreliable, and is hence avoided in the present study.

Two popular measures of lexical diversity are VocD and MTLD. Both measures are generally claimed to overcome the problem of sensitivity to text length (Jarvis, 2013; McCarthy & Jarvis, 2010). The former, originally referred to as the D measure, was developed by Brian Richards and David Malvern (Richards & Malvern 1997, as cited in Johansson, 2008). It is calculated by plotting the predicted decline of the TTR as texts become longer, then comparing the resulting mathematical curve with empirical data from a text sample. MTLD was developed by McCarthy (2005; as cited in Koizumi & In’nami, 2012) and is calculated as “the mean length of word strings that maintain a criterion level of lexical variation” (McCarthy & Jarvis, 2010, p. 381). Since each of these two measures targets unique lexical information, researchers are advised to use them in combination, rather than using any single index (McCarthy & Jarvis, 2010).

TEXT INSPECTOR[iii] (https://textinspector.com/), the tool used in the current study, is a tagger and text complexity analyzer which was initially proposed and prototyped by Professor Stephen Bax and further developed by the software team at Versantus. In its free on-line version, it analyzes written texts individually offering, among other things, the two popular measures of lexical diversity: VocD and MTLD.

 

6.      Data and Methodology

The data is composed of 100 single-authored English research abstracts by native (British) and non-native (Egyptian) researchers. The abstracts have been randomly extracted from books of abstracts published from 2012 to 2018 by reputable national and international conferences in two disciplines: Linguistics and Nuclear Science. Extraction of abstracts from full published papers has been avoided since they would likely have been subjected to editorial modification, and hence would not offer a truer reflection of a researcher’s linguistic production. In addition, to evade an undesirable overlap of individual and collective competencies, only single-authored abstracts have been targeted. This presented quite a challenge when collecting Nuclear Science abstracts, where collaborative, rather than individual, research is the common practice. As a screening procedure, all abstracts have been checked for obvious grammatical errors using MS Word grammar checker to filter out any abstract with a clustering of structural accuracy issues (see Table 5, Section 7 for the results of this screening).

Table 3 displays the details of the four evenly-represented datasets: Br(itish)-Ling(uistics), Eg(yptian)-Ling, Br-N(uclear) Sc(ience), and Eg-NSc.

 

Table 3: Details of datasets

   

No. of Abstracts

 

Mean Length

SD

WC

NN

Eg-Ling

25

 

219.2

53.80598

5480

 

Eg-NSc

25

 

262.95

82.40595

5259

 

   

 

     

N

Br-Ling

25

 

298.2

71.20861

7455

 

Br-NSc

25

 

284.35

64.10026

5687

All Grps

 

100

 

   

23881

 

Variation in the mean length of the abstracts under study has led to the avoidance of the TTR measure of lexical density as mentioned in Section 5. It can be observed from Table 3 and Table 4 that Br-Ling wrote significantly longer abstracts than did their Br-NSc counterparts. Significant ≤0.05 p values are shaded in Table 4. Within the N(on-)N(ative) groups, however, the situation is reversed. Eg-NSc wrote longer abstracts than Eg-Ling, but not significantly so. Examination of the Eg-NSc abstracts revealed that they often included reference material and data lines, which must have affected the word count. The N groups (disaggregated by discipline; Br-Ling and Br-NSc) wrote longer abstracts than did their NN (Eg-Ling and Eg-NSc, respectively) counterparts. However, the difference is statistically significant only in the linguistics discipline.

 

Table 4: Significance of mean differences in abstract length

(a)                                                                                                                               (b)

 

 

Means

p

 

 

 

Means

p

Ling

Br

298.2

0.000002

Br

Ling

298.2

0.000285

 

Eg

219.2

 

 

NSc

284.35

 

 

 

 

 

 

 

 

 

NSc

Br

284.35

0.208158

Eg

Ling

219.2

0.327686

 

Eg

262.95

 

 

NSc

262.95

 

 

Lexical diversity has been analyzed through the web-based TEXT INSPECTOR (https://textinspector.com/) which provides, among other things, measures for two measures of lexical diversity: VocD and MTLD (see Section 5). Syntactic complexity, on the other hand, has been measured using the web-based L2 Syntactic Complexity Analyzer (L2SCA; Lu, 2010; http://aihaiyang.com/software/l2sca/single/) with 14 well-defined measures of mean length of production units as well as the amount of subordination and coordination in a clause (see Section 4).

Each of the 100 abstracts (see Table 3) in the datasets was analyzed using L2SCA and then TEXT INSPECTOR. After scores for syntactic complexity and lexical diversity measures have been obtained for each abstract in the datasets, they have been compiled manually in Excel sheets and a set of independent samples t tests have been run to detect significance of differences across disciplines and language nativity. Correlation tests have been run to compare the lexical to the syntactic complexity measures.

 

7.      Analysis

Presentation of the analysis of the results obtained for syntactic complexity and lexical diversity measures is organized according to the three initial hypotheses referred to in the Introduction section (Table 1). The first hypothesis (H1) predicting disciplinary effects is motivated by the fact that linguists would naturally be more language-focused, and hence are expected to display greater grammatical richness in their writing. The second hypothesis (H2) predicts language nativity effects. The third hypothesis (H3) predicts correlation between scores for syntactic complexity and lexical diversity within the same group since both are reflections of grammatical richness.

Illustrative examples are extracted from the data and included in the analysis in sub-sections 7.1, 7.2, and 7.3. Each example is presented twice. In the first presentation of the example, the relevant tokens are marked via distinctive styles of underlining. In the second, a generative syntactic analysis of the example is presented via labelled bracketing (Carnie, 2013) to reveal the structure.

Initial grammar check screening of all abstracts through MS Word tools revealed very few grammatical errors across the four data sub-sets as shown in Table 5.

Table 5: Grammatical errors identified by MS Word grammar check in all abstracts

 

 

Type of error

No. of occurrences

Actual Occurrences

N groups

Br-Ling

None

Br-NSc

  1. Number agreement on phrasal level

1

“…enabling realization of*a scalable quantum devices …” (Abstract 12)

2.Missing hyphenation that affects the structure

1

“These heavy *element forbidden lines are routinely used to determine electron temperatures and densities, …”(Abstract 1)

3.Two tensed verbs in one clause

1

“It *is has been shown that …”

(Abstract 19)

NN groups

Eg-Ling

  1. Subject-Verb Agreement

1

“The corpus of the study *are 28 Text-Image Showcases …”

(Abstract 6)

  1. Number agreement on phrasal level

1

“… from a highbrow to *a lowbrow values through the analysis”(Abstract 12)

Eg-NSc

  1. Subject-Verb Agreement

2

“The present study*describe the use of FO technique for water desalination …”

“Different advanced *apparatus are used for suchmeasurements.”

(Abstract 4)

  1. Noun/Verb form

 

1

“How the geometry of the radiation sources can *effect the shielding design.” (Abstract 6)

  1. Verb form

1

“… and these will overcomes.”

(Abstract 4)

This limited range of grammatical errors in the data is in line with the fact that the abstracts have been produced by mature writers who, even if non-native, have at least obtained a PhD degree in an English-medium discipline. Furthermore, the task of writing a research abstract is generally expected to involve careful revision on the part of authors to weed out any obvious grammatical errors.



[i] In their taxonomic model of complexity, Bulté and Housen (2012) distinguish between cognitive complexity and absolute complexity. The former is concerned with the mental ease/difficulty by which learners acquire a certain language feature. The latter, absolute complexity, encompasses linguistic complexity, which is the focus of this research, along with propositional complexity and discourse-interactional complexity.

[ii] A T-unit or the “minimal terminable unit” was introduced by Hunt (1965) as a unit of syntactic measurement that consists of one independent clause with all of its dependent (subordinate) clauses. It differs from a sentence in the fact that a sentence may contain a set of coordinate independent clauses.

[iii]Other tools include Linguistic Inquiry and Word Count (LIWC; http://www.liwc.net/tryonline.php; Pennebaker, Booth, & Francis, 2007) which calculates the frequencies of self-reference, social and cognitive words as well as words denoting positive and negative emotions; Coh-Metrix (Graesser, McNamara, Louwerse, & Cai, 2004; McNamara, Crossley, & McCarthy, 2010; McNamara &Graesser, 2012) which, in its public versions, analyzes texts on over 200 measures of cohesion, language, and readability; and the Tool for the Automatic Analysis of LExical Sophistication (TAALES; Kyle & Crossley, 2015) which calculates text scores for 135 lexical indices related to word frequency, range, academic language, and psycholinguistic information.

References

Alqinai, J. (2013). Mediating punctuation in English Arabic translation. LinguisticaAtlantica, 32, 2-20. file:///C:/Users/tr/Downloads/22521-34072-1-PB.pdf.
Ansarifar, A., Shahriari, H., &Pishghadam, R. (2018). Phrasal complexity in academic writing: A comparison of abstracts written by graduate students and expert writers in applied linguistics. English for Academic Purposes, 31, 58-71.
Averbeck, J. M., & Miller, C. (2014). Expanding Language Expectancy Theory: The Suasory effects of lexical complexity and syntactic complexity on effective message design. Communication Studies, 65(1), 72–95.
Bardovi-Harlig, K. (1992). A second look at T-Unit analysis: Reconsidering the sentence. TESOL Quarterly, 26(2), 390-395.
Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complex1ity, elaboration, explicitness.Journal of English for Academic Purposes, 9, 2-20.
Biber, D., Gray, B., &Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 5–35. doi:10.5054/tq.2011.244483
Bulon, A., Hendrikx, I., Meunier, F., and Van Goethem, K. (2017). Using global complexity measures to assess second language proficiency: Comparing CLIL and non-CLIL learners of English and Dutch in French-speaking Belgium. Travaux du CBL, 11(1), 1-25. https://sites.uclouvain.be/bkl-cbl/wp-content/uploads/2017/04/Bulon_et_al_2017.pdf.
Bulte´, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency – Investigating complexity, accuracy and fluency in SLA (pp. 21–46). Amsterdam: John Benjamins.
Bulté, B. & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42-65.
Bulté, B., & Housen, A. (2015). Evaluating short-term changes in L2 complexity development. Círculo de LingüísticaAplicada a la Comunicación, 63, 42–76.
Burgoon, M. (1995). Language expectancy theory: Elaboration, explication, and extension. In C. R. Berger & M. Burgoon (Eds.), Communication and social influence processes (pp. 29–52). East Lansing, MI: Michigan State University Press.
Carnie, A. (2013). Syntax: A generative introduction. UK, USA: Wiley Blackwell Publishers Ltd.
Chen, M., & Zechner, K. (2011). Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, pp. 722–731. www.aclweb.org/anthology/P11-1073.
Crossley, S. A., & McNamara, D. S. (2012). Detecting the first language of second language writers using automated indices of cohesion, lexical sophistication, syntactic complexity and conceptual knowledge. In S. Jarvis, & S. Crossley (Eds.), Approaching language transfer through text classification: Explorations in the detection-based approach (pp. 106–126). Bristol: Multilingual Matters.
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79.
Dickins, J. (2017). The pervasiveness of coordination in Arabic, with reference to Arabic>English translation. Languages in Contrast, 17(2), 229-254.
Diessel, H. (2004). The acquisition of complex sentences. Cambridge, England: Cambridge University Press.
Douglas, Y., & Miller, S. (2016). Syntactic and lexical complexity of reading correlates with complexity of writing in adults. International Journal of Business Administration, 7(4), 1-10. https://doi.org/10.5430/ijba.v7n4p1
Ellis, R. (2003). Task-based language learning and teaching. Oxford University Press.
Foster, P., &Tavakoli, P. (2009). Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity. Language Learning, 59(4), 866–896.
Givo´n, T. (2009). The genesis of syntactic complexity: Diachrony, ontogeny, neuro-cognition, evolution. Amsterdam: John Benjamins.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. https://pdfs.semanticscholar.org/cb42/704113df4cbff874806cdf4a9f05d5f8065f.pdf.
Housen, A., & Kuiken, F. (2009). Complexity, accuracy and fluency in second language acquisition. Applied Linguistics, 30, 461–473.
Hunt, K. (1965). Grammatical structures written at three grade levels. National Council of Teachers of English (NCTE) Research report No. 3. Champaign, IL, USA: NCTE. ftp://128.91.234.106/papers/faculty/beatrice_santorini/hunt-1965.pdf.
Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63, 87–106.
Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective. Lund University Working Papers, 53, 61-79. http://journals.lub.lu.se/index.php/LWPL/article/download/2273/1848.
Kalantari, R. and Gholami, J. (2017). Lexical complexity development from dynamic systems theory perspective: Lexical density, diversity, and sophistication. International Journal of Instruction, 10(4), 1-18.
Knoch, U., Rouhshad, A., Oon, S. P., & Storch, N. (2015). What happens to ESL students’ writing after three years of study at an English medium university? Journal ofSecond Language Writing, 28, 39–52.
Koizumi, R., &In’nami, Y. (2012). Effects of text length on lexical diversity measures. System, 40, 522–532. doi: http://dx.doi.org/10.1016/j.system.2012.10.017.
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786.
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15, 474–496.
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36-62.
Lu, X. (2017). Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Language Testing, 34(4), 493-511.
Lu, X., & Ai, H. (2015). Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16-27.
Martínez, A. C. L. (2018). Analysis of syntactic complexity in secondary education EFL writers at different proficiency levels. Assessing Writing, 35, 1–11.
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57–86.
McNamara, D. S., &Graesser, A. C. (2012). Coh-Metrix: An automated tool for theoretical and applied natural language processing. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing and content analysis: Identification, investigation, and resolution (pp. 188–205). Hershey, PA: IGI Global. ftp://129.219.222.66/Publish/pdf/McNamara_Graesser_Coh-Metrix.pdf.
Miller, M. D., & Burgoon, M. (1979). The relationship between violations of expectations and the induction of resistance to persuasion. Human Communication Research, 5, 301–313.
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555–578.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: a research synthesis of college-level L2 writing. Applied Linguistics, 24, 492–518.
Ortega, L. (2015). Syntactic complexity in L2 writing: Progress and expansion. Journal of Second Language Writing, 29, 82-94.
Oshima, A., & Hogue, A. (1998). Writing academic English. Third Edition. London: Longman.
Othman, W. (2004). Subordination and coordination in English-Arabic translation. Al-Basaer, 8(2), 12-33. http://www.translationdirectory.com/article899.htm.
Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). LIWC2007: Linguistic inquiry and word count. Retrieved from
Price, B. D. (1974). Noun overuse phenomenon article. The Language Quarterly, 2(4), 29-37.
Ravid, D., & Berman, R. A. (2010). Developing noun phrase complexity at school age: A text-embedded cross-linguistic analysis. First Language, 30, 3–26.
Richards, B. J. & Malvern, D. (1997). Quantifying lexical diversity in the study of language development. Reading: Faculty of Education and Community Studies.
Salazar, D. J. L. (2011). Lexical bundles in scientific English: A corpus-based study of native and non-native writing. PhD dissertation. University of Barcelona.
Schenker, T. (2016). Syntactic complexity in a cross-cultural E-mail exchange. System, 63, 40-50.
Skehan, P. (1989). Individual differences in second language learning. London: Edward Arnold.
Storch, N., & Tapper, J. (2009). The impact of an EAP course on postgraduate writing. Journal of English for Academic Purposes, 8, 207-223.
Wang, S., & Slater, T. (2016). Syntactic complexity of EFL Chinese students’ writing. English Language and Literature Studies, 6(1), 81-86.
Wolfe-Quintero, K., Inagaki, S., & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu, HI: University of Hawaii Press.
Xudong, D., Cheng, L. K., Varaprasad, C., &Leng, L.M. (2010). Academic writing development to ESL/EFL graduate students in NUS. Reflections on English Language Teaching, 9(2), 119–138.
Yang, W., Lu, X., &Weigle, S. C. (2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53-67.
Youn, S. J. (2014). Measuring syntactic complexity in L2 pragmatic production: Investigating relationships among pragmatics, grammar, and proficiency. System, 42, 270–287.