Entries Tagged as ‘Digital Humanities’

October 14, 2009

How Many New Novels are Published Each Year?

In my recent talks, I’ve been saying things like “there are tens or hundreds of thousands of new novels published every year, and I just can’t read all of them.” Matt Kirschenbaum says this demonstrates a deplorable lack of initiative in our younger scholars, and he’s probably right. But is my count reasonable? I pretty [...]

September 23, 2009

Allegory in Single Authors

I’ve been following up a suggestion from Jan Rybicki about discovering statistically distinguishing features of allegorical and non-allegorical writing by comparing individual works by single authors rather than (or preliminary to) large corpora. This has some downsides (I don’t expect it to be much good for detecting characteristic terms/lemmata, for instance, which will be dominated [...]

September 1, 2009

Followups on the GBS Settlement

There have been some very smart comments on (and around) my previous post on the Google Book Search settlement. If you’re interested, you might want to see the comments section of that post, plus two good posts by Eric Kansa, one before and one after the recent GBS conference at Berkeley.
Most of my thoughts on [...]

August 26, 2009

Google and EPUBs

Google just announced that they’re making a million+ public-domain books downloadable in EPUB format. This is an improvement over the old situation, where you could download PDFs (sans OCRed text) of those books or read them in plain text online (one physical page at a time), but not download a small, well-OCRed text copy.
I’d be [...]

August 25, 2009

Victorians! Science! Semantic Indexing!

I had a very pleasant talk yesterday with Devin Griffiths, a late-stage grad student at Rutgers. (Thanks to Martin Mueller for putting us in touch.) Devin’s working on some cool LSA techniques to extract information about analogies from Darwin’s Origin and other nineteenth-century texts. He’s just started a blog to track his work and put [...]

August 23, 2009

Why I’m in Favor of the Google Book Search Settlement

When Google announced their book-scanning project five years ago, most academics I talked to about it were pretty happy. These days a lot of that enthusiasm seems, if not to have disappeared, then at least to have been tempered by serious doubts. I share some of these, but on the whole the settlement is [...]

August 11, 2009

Reading with Machines

I just put up a longish post over at Early Modern Online Bibliography called “Reading with Machines.” It’s a highly selective and impressionistic overview of literary DH work, plus a bunch of links to relevant articles/sites/blogs/etc. Might be of interest to some; I may revise it at some point for inclusion here.

July 30, 2009

Comments as Blogging

I’ve been taking part in an interesting conversation over at the Early Modern Online Bibliography blog, in a thread on the MONK Project. I’ve laid out some arguments about machine-aided work that I should probably pull together here at some point. In the meantime, see this post and thread. But check out the whole blog, [...]

July 9, 2009

POS Frequencies in the MONK Corpus, with Additional Musings

This post is on the work I presented at DH ‘09, plus some thoughts on what’s next for my project. It’s related to this earlier post on preliminary part-of-speech frequencies across the entire MONK corpus, but includes new material and figures based on some data pruning and collection as mentioned in this post (details below).
A [...]

July 5, 2009

Some POS Frequency Factoids

I’ll be posting a couple of times in the next few days about DH ‘09, THATCamp, and the state of my project. First, though, a handful of (mildly) interesting plots concerning part-of-speech frequency correlations from the MONK corpus.
MONK contains about 1,000 novels and novel-like works spread over the eighteenth, nineteenth, and twentieth centuries. (The full [...]