Notes from a Linguistic Mystic

Doctor Vowels!

I’ve now passed my Doctoral Dissertation defense, and made the revisions requested by my committee, so I’m officially done with graduate school. If you’re curious, I talk more about my dissertation here, and you can read the abstract here. Now, I’m moving on to a new position at the University of Michigan as a Post-Doctoral Research Fellow.

Although I’ve loved my time at the University of Colorado, I’m really thrilled for this next opportunity, and will keep you all in the loop as more fun research comes my way.

This site is getting a surprising number of hits (400-500 daily), and I’ve put a lot of effort into some of these posts. So, to avoid any future difficulty, I’ve decided to clarify the license of the content posted here.

As you can see from the bottom of the page, I’ve chosen to put this content under a Creative Commons (4.0) Share-alike license, just as I chose for my Praat guidebook. This license says three important things:

1. You may share, copy, redistribute, or adapt the content however you see fit. The content belongs to the public, as any academic content should.

2. If you do re-use or modify this content, you have to credit me for it, link to the license, and show if changes were made.

3. If you do re-use or modify this content, the resulting content must also be released under the same Creative Commons License.

This last part is the most important. If people are going to put work into a free website, an online guidebook, or even an open textbook, their work should remain free. People trying to make a quick buck should not be able to swoop in, repackage others’ work, and then put it behind a paywall again.

It may seem like a silly thing to think about for a small blog, but these little declarations and our continuing push for open access are the sort of low-level steps that we all can take to ensure that the content we’re producing for education, for science, and for the betterment of the field is, and remains, open to all.

Post Date: March 21, 2015
Categories: site news - personal -
~ ə ~

I’ve been thinking a lot about the tools that allow me to do what I do, and I’m often asked by curious colleagues about what software I recommend for X, Y, or Z.

So, today, I’m going to discuss the software I use to do my work as a linguist and phonetician. All of these are tools which I use regularly, which fill a niche, and which I would be very sad without. I’m not saying that each choice is the best choice for phonetic use, but instead, that each choice is the best for me.

Below are my phonetic programs of choice, organized alphabetically by function. Programs that cost money are listed with their rough prices as of March 2015. All of them run on Mac OS X 10.10 “Yosemite”, and some of them are Mac only.

I will freely admit (sorry) that I’m biased towards free and open-source software for academic research. Using a widely available and free program to do something makes my work a) less expensive, b) less likely to end up abandoned and obsoleted by some company, and c) much more easily shared with and reproduced by other researchers.

I also refuse to teach my students to use software that they themselves can’t afford, use, or buy. A $50 “student license” for a mega-program is great, but if I’ve given students skills that aren’t useful unless they can buy a$1000+ “private license” every few years once they leave school, I’ve given my students little of use.

So, although there’s a role for non-free software, and I do pay for many great programs for general computing (without begrudging the authors), I tend to favor free, and you’ll see that with two exceptions (Marked and iA Writer Pro), every application I recommend and use in my academic life is free.

Audio Conversion - XLD and iTunes - Free

XLD supports weird formats (.wv, .flac, .shn), and is really great at working with lossless file formats that Praat doesn’t read. And iTunes supports other formats, particularly things like .mp3 and .m4a, and allows you (by tweaking the “import settings”) to convert to .wav or .aiff straightforwardly.

Between these two programs, Praat, and Miro Video Converter (for video), I can convert nearly anything into nearly anything else.

Audio Recording - Audacity - Free

Praat can record. But it’s somewhat limited in its ability to record long sound files, it’s finicky in recording from multiple inputs, and it makes it shockingly easy to delete what you just recorded. So, when I’m recording data in bigger chunks, I use Audacity.

There are some other really nice bits of software for recording. Apple’s Garage Band is decent for recording as well. Adobe Audition($150) is well respected, as is Logic Pro ($200), but both are overkill for phonetic recording, and do not come anywhere near justifying their pricetags.

Bibliography and Article Organization - Bibdesk - Free

This program is incredible. It allows you to keep a library of all of your references in the open (and common) BibTeX format. It allows you to tag these references with keywords, and group by those keywords. It allows you to attach PDF copies of articles, and then organizes those PDFs by author on your drive. And, most magical, allows you to select a few references, and then with the click of a button, email them to a colleague.

It integrates (via DropBox) with PocketBib for iOS (which is getting dated, but still good), so you have all your papers with you on the go.

If you’re using LaTeX, this is the absolute best solution, as LaTeX plugs right in. But even if you’re not, seriously consider using BibDesk to sort your bibliography, books, and articles.

Editing Code - Textmate 2 - Free

I’ve used every major editor. I’ve spent time with vim, emacs, BBEdit and SublimeText, but I keep coming back to Textmate. But the choice of a text editor is deeply personal, almost religious. Walk your own path.

Experiment Design and Running - PsychoPy - Free

PsychoPy is a free and open source experimental design suite. It has a user interface for building experiments, and lets you write the experiment as python code behind the scenes if you’d like to get fancier. It has all the features I’ve found that I need, and isn’t that complicated, particularly for easy experiments.

Paid alternatives like ePrime ($1000) exist, and do offer some increased power (and certainly better tech support!), but ultimately,$1000 will buy a lot of tutoring in PsychoPy and Python, and will pay a lot of subjects with the cash left over.

A buggy alternative - PsyScope X - Free

This is a modernization of experimental design software written in the 1990’s. It’s free, and it’s workable with modern Macs. It’s also got a decent GUI for programming experiments, and works with many different hardware response boxes. However, it’s also very buggy, and you will spend as much time trying to troubleshoot your project as you did creating it in the first place. If you can’t use PsychoPy, and you can’t afford ePrime, this is an alternative. But, having used this for years and then moved to PsychoPy, I would never go back.

Forced Alignment - P2FA - Free

The Penn Forced Aligner is a great tool for aligning text to recordings of American English speech. I talk a lot about it in this post. For French, I’ve used EasyAlign, which gets reasonable results, and a newer port of P2FA called “SPLAligner” by Peter Milne, which gets really great results.

IPA Fonts and Keyboarding - This - Free

I’ve maintained (since 2007) a post on installing IPA fonts on the mac. So, obviously, I recommend what I recommend there. Check it out!

Machine Learning - R - Free

I’m increasingly of the mind that phoneticians are going to want to use machine learning to study speech and speech perception.

I’ll talk about R for statistical uses below, but the very same R has some capable libraries for machine learning. In my dissertation, I used Machine Learning (specifically SVMs and RandomForests) to model the perception of acoustical cues in humans, and to test features quickly and cheaply. To do this, I used two libraries, or extensions to R:

• e1071 - For SVM model training, testing, tuning, and creation
• RandomForest - For creating RandomForests.
• Tree - For vanilla decision trees

Those packages made it easy to do machine learning using the same data I used for all my other analyses, and to output my graphs and tables all at once. 10/10, will use again.

A worthy alternative - Scikit-Learn - Free

If you already speak Python, or want more power and flexibility, Scikit-Learn is a great option. It has lots of algorithms, lots of libraries, and good documentation. The only reason I didn’t use this package is because I already know and love R, and because it was easier to work with my data in just one place.

PDF Reading - Skim - Free

OS X includes Preview, which is great, but Skim is just a bit nicer. It shows you a table of contents for files with that data. It lets you jump to a page by entering text. And it plays very nicely with LaTeX, highlighting recent changes. If you’re happy with Preview, stick with it, but if you’re not, use Skim.

Presentation Software - Reveal.js - Free

This is a very nerdy pick. Basically, it allows you to make presentations which are also websites. You can have transitions, a presenter’s display, you can advance the slides with a remote, you can build items in progressively, and you can include images, audio, and video.

The beauty is that all of your presentations are actually html files (with bits of markdown, if you’d like), and that writing them is as easy as making an outline of a paper. You don’t need to worry about adjusting spacing, font size, etc, because that’s all done for you. This, particularly with Markdown, allows you to tap out the next day’s powerpoint in an email to yourself on your phone, if you’d like.

You can also do fancy tricks, like posting your slides online for students, embedding YouTube videos, and styling your presentations using CSS. Students particularly loved being able to go through the slides, complete with sound and video, at home, and even on their smartphones.

The problem with it is that your slides are all websites. Unless you’re content editing bits of HTML, CSS, and tastes of Javascript, this may not be for you. It’s also tougher to do fancy composite images (“I’m going to make a Koala pop up on top of the vowel chart, then slide off to the right!”).

It’s not for everyone, but it’s really powerful. Now that I’ve started using reveal and used it to run a full 27-lecture course, I can’t go back.

MATLAB, a proprietary programming language, can be extended to do much of what Praat does, and MATLAB is more powerful for strict signal processing. Unfortunately, it costs $500 (no, that’s not a typo) even for non-student educational use, and even more if you’re outside of academia. This means that students won’t be able to use it after graduation, that colleagues won’t reliably have access to it, and that you will always be just a bit poorer than you otherwise would’ve been. I’m hoping that, much like R (see below) has replaced expensive and proprietary options like SAS and SPSS for many academics, octave or Python with specific libraries will catch up to signal processing feature parity, and thus, a more powerful tool will come online for widespread use. But until it does, I’m doing my best to get by without MATLAB, and hope plenty of other folks do the same. Statistics - R - Free R is spectacular. It’s great for statistics, for data manipulation, for graphing, for generating tables, and even for machine learning. In addition, because it’s more or less a programming language, although the learning curve is higher, one can conduct an analysis in such a way that somebody else who has your data and your code can reproduce your analysis exactly in a few keystrokes. At this point, it has surpassed (in most relevant ways) its non-free competition, and if you’re planning to do statistics (or planning to learn it), you should be using R. Because R is a programming language, it also makes use of libraries, which add functionality. A few of these merit special mention, and all are downloaded through R: • e1071 - This is a package for doing many kinds of machine learning tasks in R, and works really well for SVMs. • ggplot2 - This is the package for graphing in R. It’s got a learning curve, but allows for true beauty. • lme4 - This is my favorite package for running linear mixed-effects models (and here is a great tutorial for using them). • praatr - PraatR is an interface to Praat within R, which allows you to use Praat commands within R for analysis. I haven’t used it much, as I think in Praat scripting, but the author and the concept are both brilliant. • stargazer - Allows easy export of tables in R to HTML, LaTeX, plaintext. Nearly every table in my dissertation was generated directly from the data or analysis using Stargazer. • vowels - This is strictly for phonetic data. Discussed more below. Video Conversion - Miro Video Converter - Free I’m often given a video file, whether from Youtube, field recordings, or otherwise, and asked to do some analysis. When that happens, I use Miro to turn it into a sane format (usually mp4), or to extract the audio (using the “Format” setting). Vowel Plotting - The ‘vowels’ package for R - Free Although you have to reformat the data into a very specific column ordering, then import to R, the ‘vowels’ package is great, and produces some really beautiful vowel plots. It’s better than any other approach I’ve found. Youtube Video Downloading - youtube-dl - Free This is a free and easy command line utility for downloading videos from YouTube. If you wanted to download a video of Ken Stevens being irradiated for phonetics, you would just install youtube-dl and type the below at a terminal: youtube-dl "https://www.youtube.com/watch?v=DcNMCB-Gsn8"  You can then use Miro (see above) to convert to sound, and next thing you know, you’re good to analyze. Word Processing and Writing - XeLaTeX or Markdown - Free I describe my complicated writing workflow in the last post, but I love LaTeX, and XeLaTeX makes it even better, allowing full Unicode support (so, effortless IPA, and more!). There’s a reason that I’ve taught LaTeX for Linguists several times. I’m passionate about it. If I’m writing something more casual, or using my crazy workflow above, I’ll write using markdown. Markdown is a simple way to mark formatting in text, which can then be transformed into other formats using tools like Pandoc (Free) or Marked ($14). It’s a nice way to write plaintext, and let formatting just get out of my way. I’m partial to iA Writer Pro ($20) for putting markdown text on a page in a pleasant environment, but Textmate 2 (Free) is 80% as good for free. However, both of these solutions are really geeky. LaTeX has a scary learning curve, and Markdown is kind of finicky, given that you need a second program to print it. Both are unquestionably worth the time to learn, but if you haven’t the time, patience, or geek-tolerance, there’s always… An expensive alternative - Microsoft Word -$80

If I’m not in Markdown or TeX, or if I’m collaborating with somebody who’s scared of TeX and doesn’t want to use Overleaf (formerly WriteLaTeX) (Free), I’ll use Word. But I won’t be super happy about it.

So, I hope you’ve enjoyed this list of phonetic tools and software, and that somewhere, somebody out there finds something new and wonderful.

~ ə ~

My particular form of procrastination is optimization. You can tell I don’t want to cut two bags of potatoes when I’m sharpening the kitchen knives. You can tell I’m uninterested in laundry when I ‘m cleaning the dryer barrel. And when I didn’t quite know where to go with my dissertation prospectus, well, I decided that I needed to develop a more graceful way to do so.

For the last few years, I’ve written all my large papers in XeLaTeX (using XeLaTeX for unicode support, making IPA much easier). I love LaTeX, love BibTeX, and love not worrying about formatting. But writing long sections of text in LaTeX kind of sucks, because it’s rather clunky and there are no good editors for LaTeX on mobile devices.

In LaTeX, making text bold requires you to wrap the word or phrase in eight characters worth of tags. Section headings are ugly, and also have accompanying tags. Every %, & or _ must be escaped. LaTeX is powerful for doing complex things, but while writing prose, it just gets in the way.

Why Markdown?

I decided that I’d rather write in Markdown. Markdown is an easy syntax for writing, where you can define section headings as easily as:

# This is a section heading


Bold, italic, and bold-italic are as easy as:

**bold**, *italic*, ***bold italic***


Most importantly, it’s designed to be quick to use and type using available symbols. So, in short, writing Markdown doesn’t suck, but I wanted to still use the best of LaTeX, for things like dynamic numbering, BibTeX automatic bibliographies, and easy creation of nice tables.

So, I hacked together a solution using Pandoc, the same software I use to generate this site from Markdown.

Turning Markdown into LaTeX

First, I created two documents which had the preamble code for LaTeX in one (everything up until the first section heading), and the footer info in the other (the bibliography).

Then, I created a markdown file for the meat of the paper, which I’ll later convert into LaTeX and stick between the header and footer. I stuck this markdown file in my Dropbox folder and I edit that markdown file to write the paper, whether on a Mac (using TextMate), or on an iPad or iPhone (using Editorial). You can make individual chapter files and concatenate them, if you’d prefer, but I stuck to one mega-file.

The beautiful thing about this approach is that I can write Markdown, which is readable and pleasant, 95% of the time, and then switch into LaTeX in the same file to add something fancy, such as a \cite{Paper Citation}, a \ref{reference} to a \label{labeled section} or a \footnote{}.

I can also include LaTeX tables, throw in \input{} commands to read other tables in, and use \vspace{} where needed. There’s no penalty to going back and forth, and I have the power of LaTeX when needed, and the easy-pretty of markdown when I’m just writing.

This also allows me to use Stargazer, a package for the R Statistics Suite which allows you to directly output data as pretty LaTeX tables. I just have Stargazer output to a .tex file, then \input{} that .tex file. It’s both wonderful and reproducible, because all of my figures, tables, and models are generated directly by R, so no “copy-paste” errors are possible.

How?!

Well, the joy is in the script that creates the data. When I’d like to see a final version, I run a script in the terminal (or hit Cmd+Option+Control+Shift+PageDown, triggering it through KeyboardMaestro.

Although you’ll want to look at the script itself, which is extensively commented, basically, it does the following:

1. It copies all of the text from Markdown files, and all of the analysis scripts, into a single place.
2. It turns the Markdown into a LaTeX file using Pandoc.
3. It cleans up the output a bit.
4. It tacks a custom header and footer onto the output, which contains all my style information.
5. It builds the document and bibliography in LaTeX
6. It opens the PDF copy in a PDF reader, and copies the latest PDF version to my dissertation folder
7. It builds a .tar.gz archive containing the complete text and analysis scripts, and saves it to a “backups” folder by date.
• This way, if I mess something up, I can always go back to the last version(s), and I’ve got a way to compare changes if I need to.

It combines the best parts of simple plaintext writing with the best parts of LaTeX, and allows me to be as productive on my phone or iPad as I can be at home (with the exception of rendering a new PDF, and using PocketBib for reading and finding citekeys). In short, it allowed me to write 72,000+ words of dissertation, and not hate my life. I’ve since moved my guide to using Praat to a similar workflow, so I can write it using Markdown too!

Most importantly, though, I’ve found a way to make writing a dissertation geekier than it already was. And that, my friends, is my real accomplishment.

~ ə ~

Hi everybody! Three pieces of news, two language related.

Dr. Vowels?

First, I’ve just completed and submitted to my committee my doctoral dissertation. For those of you not familiar with the American Ph.D system, a Doctoral Dissertation is a large paper (or small book) in which you aim to make a teeny, tiny increase in the world’s knowledge on something or another.

My dissertation, which I described here for a non-linguistic audience, worked to get a better idea of how we humans are able to hear vowel nasality (the difference in the vowel between “pat” and “pant”). Here’s a word-cloud of what I wrote, just for grins:

So, after running four experiments, recording around 4000 words, and writing 170 single-spaced pages, I now have some conclusions, which I’ll present to a committee of six professors from my university on March 18th, and, after they grill me about it for around 2 hours, if they agree I’ve done good work, I will have a Ph.D!

After that, I’m moving to the University of Michigan to act as a Post-Doctoral Research Fellow on a major grant (the “Post-Doc” option discussed here), investigating how people’s perception of different nuances of speech is reflected in their production of these things.

So, between that, publishing my dissertation research, and continuing to work on my guide to phonetic analysis using the Praat software package, I’ll be busy, but I’m also hoping to post a bit more often here.

As part of that process, I’m going to be making a few updates to the site itself. I’ll be going through some old posts to make sure they’re linguistically sound (and maybe add sources), fixing some long-standing formatting issues, and tweaking the code a bit. I’ll keep all the URLs the same, but don’t be too shocked if the odd post disappears, or if some things are updated or changed.

Unrelated…

Finally, as many people who know me will mournfully attest, I’m a lover of puns. I collect and often deploy Wordplay, puns, father goose stories, or other humor that makes you hurt and laugh in equal parts.

Over the years, I’ve developed quite a large list of terrible puns, and I’ve decided to put them online, because some people just want to watch the world groan. If you’d like to be in pun-pain, well, go to my #crappypuns website at:

http://savethevowels.org/crappypuns/

I welcome contributions, either by comment, Facebook, or email. And if you ever hear a good fish pun, be sure to let minnow.

(Sorry.)

Post Date: March 3, 2015
Categories: humor - personal -
~ ə ~

I’m kind of a nerd about websites. I’m not content to use Dreamweaver, or just write some code. I always want my websites to be a lightweight and optimal as they can be.

When it comes to web publishing, I’ve always been a bit of a minimalist. Over time, I moved this blog from a hosted solution, to a Wordpress install, and then eventually, to Jekyll (that migration process is explained in detail here).

I started off creating my personal site using Jekyll. This was rather a waste, considering that Jekyll’s made for blogs, and that site is really just styled text at its core, with nothing temporal, and no need of fancy tags or pagination. But I still wanted to be able to write in Markdown and style it with CSS afterwards.

So, I got the idea to write everything in markdown, style with CSS, use Pandoc to convert it to HTML, then just drop it onto the server.

Oh, simple!

Actually implementing this was one of those things that’s easy to do once working, but takes forever to get exactly right. The core of it is a single shell script (viewable here, slightly de-identified) which converts the markdown to HTML, then uploads it.

The hardest part of setting all this up was figuring out the syntax of the below command, applied to each folder:

find . -name \*.md -type f -exec pandoc -B includes/spcvhead -A includes/spcvfoot -o {}.html {} \;


In English, it finds any file which ends with “.md” (a markdown file), then executes the pandoc command, including spcvhead (containing the header info, overarching style info, etc) (B)efore the file , then spcvfoot (A)fter. Then it outputs the rest as .html files. If you want different headers/footers in different parts of the site, just run the command on each folder with a different set of includes.

This gives you folders full of .md.html files, due to a quirk of how Pandoc operates. It then goes through and changes those back to .html files with the below command:

find . -depth -name '*.md.html' -execdir bash -c 'mv -i "$1" "${1//md.html/html}"' bash {} \;


Then, it uploads the contents of the site folder (html, css, etc.) to the server using rsync, and goes through and removes the newly generated .html files (to keep the local folder tidy).

This allows me to write pages, posts, and essays using mostly markdown, with occasional dashes of HTML/CSS to style particular elements (page titles, lists, images).

It works great, and is the closest a CMS has ever come to simply getting out of my way. Hopefully this description (and the shell script that makes it work) will prove useful to others.

~ ə ~