The "Bamberg Survey of Lexical Variation and Change" Project (DFG funded)

The Bamberg Survey of Language Variation and Change (BSLVC) is a digital database of lexical and grammatical preferences of speakers of different varieties of English. This current project, which is an interdisciplinary collaboration between linguists (Chair of English Linguistics, University of Bamberg) and statisticians (Chair of Mathematics in Economics and Statistics and Econometrics, University of Bamberg) and is currently funded by the Deutsche Forschungsgemeinschaft (DFG) from 1 October 2024 until 30 September 2027

With this project, we pursue the goal of creating a digital database for comparative studies of lexical and grammatical preferences in different varieties of English. The BSLVC project was initiated in 2008 with a focus on language contact between English and Romance languages. The focus has since expanded to include other places where English is used as a first language (English as a native language, ENL), including England, Wales, Scotland, Australia, Puerto Rico and the USA, as well as regions where English is spoken as a second language (English as a second language, ESL, e.g. Malta, Gibraltar, Puerto Rico) or as a foreign language (English as a foreign language, EFL, including Germany, Slovenia and Sweden). 

To date (as of January 2025), we have collected around 400,000 lexical data points and around 250,000 grammatical data points from over 6,000 questionnaires.

Upon publication, the BSLVC project will transition into a crowdsourcing phase, where external scholars working on specific varieties of English will have the opportunity to contribute to the database through local data collection and field work. In addition, the Ethics Council of the University of Bamberg has certified that the BSLVC project is ethically unobjectionable.

Questionnaires as data sources

Questionnaires often exhibit a narrow linguistic focus, contributing to the scarcity of versatile survey databases. Three notable exceptions ¨C the World Atlas of Language Structures Online (Dryer & Haspelmath 2022), The Atlas of Pidgin and Creole Language Structures (Michaelis et al. 2022), and the electronic World Atlas of Varieties of English (eWAVE; Kortmann et al. 2020) ¨C have been well-received in the linguistic community. However, they reflect expert judgments on the global status of features in a variety (e.g. absent, rare, pervasive). While such survey data holds considerable encyclopaedic value, it falls short as a primary source for quantitative studies investigating patterns of variation within a variety, particularly concerning social and stylistic dimensions. 

This contrasts with the Bamberg Survey of Language Variation and Change (BSLVC), which is growing into a large-scale, empirically grounded questionnaire database that meaningfully complements (i) expert judgments recorded in eWAVE (which to date includes no EFL varieties and only few of our project varieties), and (ii) the prevalent corpus-based approach to the study of varieties of English. In terms of empirical scope, it explicitly addresses some of the weaknesses of corpus approaches outlined above and can inform the larger debate on the modelling of English world-wide (see Deshors 2018; Hundt 2021; Buschfeld & Kautzsch 2022). Mixed-method designs and purpose-built instruments such as questionnaires (cf. Dollinger 2015) which are used in the BSLVC offer the following benefits for studying language variation and change:

i) They yield comparable data for different varieties of English, therefore facilitating more robust cross-varietal analyses. 

ii) The elicitation of comprehensive sociodemographic information from respondents allows linguists to shed light on the social dimension of language variation. 

iii) Through appropriate survey design and item formulation, informant responses may be obtained for almost any feature

Questionnaire data come with their own set of challenges. Their complex structure and ranked response formats require specific analytical approaches and user-friendly tools to handle them are scarce at the moment. In this context, versatile survey databases like the BSLVC project can help. Though not yet published, BSLVC stands out for its strong empirical foundation and wide coverage, offering valuable potential for researchers looking to go beyond traditional data sources.

Funding

The BSLVC currently receives funding by the Deutsche Forschungsgemeinschaft (DFG) from 1 October 2024 until 30 September 2027 (Grant: 548274092). It previously received funding from the Bavarian Ministry of Science and the Arts from 2008 until 2014, as well as the Spanish Ministry of Education and Science with European Regional Development Fund (ERDF) (Grant: HUM2007 -60706/FILO).

Data Collection

The questionnaire consists of three parts:

The first part collects socio-demographic data of the participants (including age, gender, educational and professional background). 

The second part investigates lexical preferences for British-American (near-)synonyms (e.g. pavement-sidewalk, diapers-nappies). 

The third part collects usage ratings for 345 sentences (138 spoken, 207 written). In the spoken section, participants listen to recorded sentences read aloud by a native speaker of the target variety. Both the spoken and written stimuli are presented in two distinct contextual settings that reflect different levels of formality:

(i) an informal conversation among friends

(ii) a formal email to a former teacher

The sentences include a wide range of grammatical features such as double comparatives, zero marking of third person singular, and sentence-final but, ensuring that the database can support a diverse array of linguistic research questions. The BSLVC database can then provide quantitative information on the distribution of a feature and reveal social (e.g. age, gender) and stylistic aspects. The form of rating task used in the questionnaire yields data on an ordinal, Likert-type response scale.

Given that the varieties that are the focus of the BSLVC have not been subject to much research, data collection on site is an essential component of the project. Data is primarily collected through face-to-face interviews conducted during field trips. While most of the data has been collected using a standardized paper questionnaire so far, a digital version has also been developed recently. The questionnaires consist of several sections that can be used independently of each other. 

Participation in the questionnaire is voluntary, anonymous, and participants are informed that they may withdraw at any point. Students participating in the full questionnaire (lexicon and grammar) are typically paid €10 (or equivalent). Members of the BSLVC team have always obtained ethical clearance for the complete questionnaire at those universities that required ethics approvals (e.g. Malta, Edinburgh). The elicited sociodemographic information does not contain personal data that allow the identification of individuals.

Digitalisation

Completed questionnaires were scanned, digitalised using an Optical Mark Recognition (OMR) software and checked manually by student assistants of the Chair of English Linguistics. Additionally, a digital version of the questionnaire based on Limesurvey is used. Finally, the data are normalised and merged into a databank. Data analysis and visualization are mainly done in R and Python.

References

Buschfeld, Sarah & Alexander Kautzsch. 2022. Modelling World Englishes: A joint approach to Postcolonial and non-postcolonial varieties. In Modelling World Englishes. Edinburgh University Press. https://doi.org/10.1515/9781474445887

Deshors, Sandra C. (ed.). 2018. Modeling World Englishes: Assessing the interplay of emancipation and globalization of ESL varieties (Varieties of English Around the World G61). Amsterdam: John Benjamins. https://doi.org/10.1075/veaw.g61

Dollinger, Stefan. 2015. The written questionnaire in social dialectology: history, theory, practice (IMPACT volume 40). Amsterdam, Philadelphia: John Benjamins. https://doi.org/10.1075/impact.40

Dryer, Matthew & Martin Haspelmath. 2022. The World Atlas of Language Structures Online. Zenodo. https://doi.org/10.5281/ZENODO.7385533

Hundt, Marianne. 2021. On models and modelling. World Englishes 40(3). 298¨C317. https://doi.org/10.1111/weng.12467

Kortmann, Bernd, Kerstin Lunkenheimer & Katharina Ehret (eds.). 2020. The Electronic World Atlas of Varieties of English. Zenodo. https://zenodo.org/record/3712132

Kortmann, Bernd and Schneider, Edgar (eds.) with Burridge, Kate, Mesthrie, Rajend and Upton, Clive. 2004. A handbook of varieties of English. Vol. 1: Phonology; Vol. 2: Morphology and syntax. Berlin and New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197181

Michaelis, Susanne Maria, Philippe Maurer, Martin Haspelmath, Magnus Huber & Robert Forkel (eds.). 2022. Atlas of Pidgin and Creole Language Structures Online. Oxford: Oxford University Press. http://apics-online.info

 

The references listed here refer to all subpages regarding the BSLVC.