I will attach a sample of how it is supposed to look. If this is done well I Will pay you extra over Venmo please be in touch.
Based on materials developed by Lisa Pearl (http://www.socsci.uci.edu/~lpearl/ )
1 Introducing CHILDES [10 points]
1.1 Background & Reading: the main website https://childes.talkbank.org/
CHILDES stands for CHIld Language Data Exchange System, and is one of the most useful freely available sources of empirical data on child language input and output. Some highlights:
• Browsable transcriipts: https://sla.talkbank.org/TBB/childes
• Information about downloading corpora: https://talkbank.org/share/data.html
Downloading corpora is very straightforward – simply follow the link to the appropriate corpus that you’re interested in, and click on it.
• For example, check to make sure you would know how to download the “Frog Story” corpus from Dan Slobin. What information do you see about what is contained in the corpus, and how the corpus needs to be cited?
• Click on the link that takes you to the “Frog Story” methods.
o The link to the frog story is broken. Here’s a video of the book: https://www.youtube.com/watch?v=BwDc3aOb-E0&ab_channel=AmandaThorp
o What kinds of questions could you ask about language development, using this corpus?
Take a moment now to browse through some of the available corpora. Note the languages, number of children, ages of participants, and other characteristics.
• Identify at least five languages that CHILDES provides child language data for.
• Find the clinical corpora. What disorders are represented? Which ones are available for English?
• If the audio is provided, you should be able to click on the media folder and listen to the recordings. (Try to find some audio recordings now.) What is the name of one of the audio recordings?
• Consider the American English data available, as described in the American English manual here: https://childes.talkbank.org/access/Eng-NA/ . Identify three corpora that include data directed at children between the ages of 2 and 4 years old. What kinds of interactions were included in the corpus?
o Corpus 1:
o Corpus 2:
o Corpus 3:
Create a folder where you’re going to store the downloaded corpora.
CLAN Background
CLAN stands for Computerized Language ANalysis, and is a freely available tool provided through the CHILDES project. From the CLAN manual: “It is a program that is designed specifically to analyze data transcribed in the format of the Child Language Data Exchange System (CHILDES). Leonid Spektor at Carnegie Mellon University wrote CLAN and continues to develop it. The current version uses a graphic user interface and runs on both Macintosh and Windows machines… CLAN allows you to perform a large number of automatic analyses on transcriipt data. The analyses include frequency counts, word searches, co-occurrence analyses, MLU {Mean Length of Utterance} counts, interactional analyses, text changes, and morphosyntactic analysis.”
1.2 Install CLAN on your computer & get the latest CLAN manual
The website containing downloadable install programs and source code for CLAN, along with a link to the current CLAN manual:
https://talkbank.org/manuals/CLAN.pdf
Read section 1.3 or 1.4 “Installing CLAN” in the clan manual for more detailed installation instructions on how to install CLAN for your particular computer’s operating system.
In general, it pays to keep the manual handy as a reference, though the CLAN program itself also has ways for you to get help directly.
2.0 A quick CLAN tutorial. [10 points]
To familiarize yourself with the basic layout of the CLAN program, work through sections 3.1– 3.3 of the CLAN manual (pp.12-19).
– Note: I get a slightly different output on some of the analyses. If your results are a little different, or organized differently, don’t worry too much.
– Paste your output here. (Screenshots or copy/paste)
o 3.3.1 Sample KWAL Run
o 3.3.2 Sample FREQ Run
o 3.3.3 Sample MLU Run
o 3.3.4 Sample Combo Run
o 3.3.5 Sample GEM and GEMFREQ Runs
Look through the list of Advanced commands (3.4) – pp 19-22.
-Identify 2 new commands that you haven’t seen before, and provide a brief descriiption of what each command does.
• Command #1:
• Command #2:
3.0 Brainstorm [10 points]
I want you to browse through the databases, thinking broadly about what data are here, and what questions you could ask about these data. Identify at least 3 different topics, the ages you would be looking at, and which kinds of corpora might be available to answer your question.
Topic Ages Possible Corpora
4.0 Your main assignment, in 3 parts: [50 points]
1) Create a question, and answer that question using the CLAN tools/CHILDES database. [20 points]
2) Locate other published work (at least one peer-reviewed journal article) that addresses the same question. [10 points]
3) Create a short (5-minute) presentation that explains how you did your analysis, what you found, and how that fits with the broader literature. [15 points]
4) Watch at least 3 of your peers’ presentations, and comment on them on our shared Mediaspace channel. [5 points]
More detail on each of these components:
1) For your question, you need to ask something about a child’s syntactic development, or about how adult’s grammar changes as the child matures. For example, you could ask:
a. What is the average age kids start using the preposition “from”, and when they do use it, how complex are the initial sentences that the child uses the word “from” in.
b. Do adults change the length of their utterances as children grow older?
c. When do kids start using reflexive pronouns?
d. Do Chinese-English bilingual children use fewer definite articles (e.g., “the”) compared to monolingual English-speaking children?
Download the appropriate corpora into a folder that you use as your working directory. You will also want to keep a spreadsheet of the corpora you’re using, the kids’ ages, and any other relevant facts. That’s also a great place to put the results of your analyses. If you will be using commands in CLAN that were not mentioned in that first tutorial, do a search through the manual to see if CLAN will give you a way to answer that question.
Note: if you are interested in a project regarding phonology, you can either use the phonbank, or you can create your own transcriipts based on video/audio. Please consult with me before making this choice!
2) For the published article that’s a comparison, you need to determine whether your results match what’s been published, or if there is a difference between your results, what kind of difference there is. For the purposes of this project, you’ll just need to find one other paper to use as a benchmark/comparison.
3) Presentations will be approximately 5 minutes , using this basic template. For each presentation, briefly tell us:
a. what your question was
b. what the related article tells us about the topic
c. how you did the search (both the string and the set of corpora used)
d. what you found
e. how this relates to the article you found
f. citations for your article, for the corpus you used, and for CHILDES/clan
g. You will record your presentation and upload it to the Mediaspace channel for the class.
4) You will also need to share with me a folder with your completed CHILDES background work from this document, plus the documentation from your project (including the spreadsheet of data, a record of commands you ran, a folder with your transcriipts, the article you’re citing). You can do that either by sharing a folder in Google Drive, or by zipping the file and sending it via email.
5) If you completed this in pairs, you will also need to send me a paragraph via email, indicating how you and your partner split the work, and what you contributed to the project.