Intervention
By Invitation
Reflections on the Digital Humanities: A Conversation with Matthew Warner and Nichole Nomura

The Digital Humanities has existed as an institutionalized field of research at Stanford for now more than a decade, drawing undergraduates, graduate students, and researchers from around the globe. This series, a collaboration between Arcade and Stanford’s Center for Spatial and Textual Analysis, spotlights leading research in the digital humanities at Stanford, and asks key contributors to reflect on the expansion of the field, its culture, and the major misconceptions that remain.

Why has this field sparked so much public engagement in its projects and debates? How has the digital humanities changed what it means to be a more “traditional” humanist? And how is the field engaging new developments in technology like artificial intelligence?

In this interview, former Interventions editor, Charlotte Lindemann, speaks with Associate Directors of the Stanford Literary Lab, Matthew Warner and Nichole Nomura. 


CHARLOTTE LINDEMANN: Let’s start with how each of you entered this field or came to be interested in these questions.

NICHOLE NOMURA: I’ve been a member of the Stanford Literary Lab for a long time, seven or eight years, including most of my graduate career at Stanford. I didn’t know what the digital humanities was before I came to Stanford. I think I’d read one article in a theory survey course. The instructors were basically like, some people are doing this thing, it’s called digital humanities, we should all be extremely suspicious of it. And then we moved on.

And then I came to Stanford and during the admin visit the Lit Lab gave a presentation and a colleague, Steele Doris, gave a presentation on fanfiction and the work she was doing as part of the lab. It just felt incredibly exciting and welcoming, first of all, that a woman researcher was working on fanfiction at Stanford, and the fact that she was using digital humanities methods was actually kind of secondary. I was like, oh, this is a space that I want to be in. These are people I want to talk to. And so that’s how I got into the lab, through a project, which I think for many of us in the lab is our way in. There’s something we really care about studying and there’s a room full of people doing it. So we come and we hang out and we talk about the methods and we collaborate in that way.

MATTHEW WARNER: I had also brushed up against the digital humanities in a theory survey in undergrad, and I distinctly remember coming here for the admit visit and being like, oh, this is where it’s happening. I had absolutely no idea about the Stanford Literary Lab, it was not on my radar at all. I only got involved later. Actually, Nichole roped me into a project on historical fiction. At the time, I was working on late medieval and early modern conceptions of the past and literary temporality. So I started coming to meetings for that project and started coming to meetings for other things and it slowly took over.

CL: The Literary Lab is such a clearly defined community in our department. How do you see your work there as fitting into the larger landscape of digital humanities research?

MW: Most of the digital humanities work in the English department is connected to the Literary Lab, and the Lab gives a particular sort of computational text mining style of digital humanities a higher profile, although there are absolutely digital humanities research projects that don’t look anything like the sort of work that we do in the Literary Lab. I don’t know whether “digital humanities” is really the most cohesive grouping of things.

NN: Yeah, when I’m pressed for a definition of the digital humanities I usually draw a Punnett square in the air and explain the four quadrants: you can do digital things on digital materials, you can do digital things on non-digital materials, and then you can go the other way too: you can do non digital things on digital materials, and you can do non digital things on non digital materials. I map that in the air and point out that digital humanities covers three of those four squares. It’s a useful shorthand, and we all end up at the same conferences sometimes. 

I think that Matt’s point about English and DH and the Lab being a little bit more coextensive is a good one, but I’d also say that I could bring anything in the other Punnett square cells to the Lab and find people to talk about it with. For instance, if I did bring a mapping project to the Lab, people in the Lab would be equipped to talk about it, and flexible enough to think about it and interested in a way that might be harder in a field-specific context. You get a different kind of buy-in from people in the digital humanities: there is a community-level investment in figuring out how those different elements of the Punnett square fit together. And you have to spend less time proving that those spaces are worth studying, we can just get to work.

CL: I remember someone told me early on, maybe on that first admit visit, that the best thing about the Literary Lab is just being around a community of people who don’t need you to constantly defend your decision to use digital methods.

NN: Or who are equipped to engage with you about those kinds of questions. I come to the Lab because those are the people who are going to push my thinking the hardest and disagree with me the strongest about my decisions regarding digital methods. So it’s less asking you to defend yourself than really engaging with you.

MW: I think that’s right, or at least that’s the ideal. Imagine you take a computational project, like a text mining project to a group of subject matter scholars. Right? Say you’re working on the 20th century novel. You’re talking to a bunch of scholars of the novel, and there’s a way that, especially if what you’re doing is on the more computationally complex end of things, it can just be kind of mystifying, it can be hard for them to fully engage. They can talk about the premises of the project, they can debate about the selection of texts, things like that. But you take the same project to the Lit Lab and somebody can point out that step two of your methodology just doesn’t work. You’re doing something computationally that doesn’t make any sense at all. We’ve seen things make it all the way through peer review and get published and someone points out, this is methodologically unsound. And it’s because a bunch of people who knew a lot about the subject matter are not necessarily well versed in the kinds of tools the author is using. You really do want to make sure that you’re not in a methodological bubble. The lab is a space where there’s a higher than average concentration of people who might be able to spot your methodological mistakes. Maybe they’ve read a paper last week and have a recommendation, like, this new approach worked for these authors, you could try it.

CL: To what extent are the methods themselves, whether developing new tools or applying existing tools in new ways, a driving force in your research? And to what extent is your subject matter–say, burning questions you might have about the 20th century novel–more the engine for you and the methods more of a means to that end. And this might be different for each of you. 

NN: I think it’s a false binary. The field is really young, and that’s what makes it really exciting. We’re still figuring out the relationship between methods and reliable signification off of the results of those methods. For instance, people who have an investment in historicism are thinking about the method at the same time that they’re using it and getting results off of it. The work happening outside of computational literary studies is also about making rules for making meaning based on theories. I don’t see what we do in the digital humanities as fundamentally different. It’s still moving between a theory and a way to identify that theory’s implications for your meaning-making practices, and then being able to convince other people that what you just said is valid, is consistent across fields, across disciplines, across time.

CL: What about for you, Matt?

MW: Nicole has me fairly persuaded that maybe this is an unhelpful sort of binary. But I often feel like I am more methodologically invested in computational tools in and of themselves than a lot of the people I talk to. For me, the basic fact that we can count words in texts–I’m thinking of the basic stylometric process–that the frequencies of very small, boring words, like “the” and stuff like that, allow you to reliably identify the author of a text. That’s just really interesting to me. I don’t actually find the conclusions of Stylometry as fascinating. I’m not very interested in who wrote what usually. But the basic fact that like these methods can produce meaning about literary texts using oftentimes computational methods of radically varying simplicity, whether it’s counting words or it’s fancy large language model, these things actually work, and I find that so interesting. Whether the results are good, bad, boring. You know, who wrote this Federalist paper? I don’t really care, but it’s more interesting that counting articles and pronouns can tell you that.

NN: Stylometry is such an interesting example because it does have a pretty established interpretive protocol, which is authorship. We count these things and we’re pretty sure that this reliably means author signal. I wonder what the field will look like when other things solidify like that. What does the field look like when topic models have a reliable interpretive protocol? I don’t think we have one for topic models yet. We’re still trying to figure out what a topic is and how we make meaning out of them. Or to give another example, we don’t have reliable methods for quantifying the representation of gender in literature–and that’s fantastic. It’s so much fun. What does the field look like, or does the field ever decide, this is how we count gender. Do we reach field consensus and it becomes a reliable piece of evidence down the line? I don’t know what DH will look like when that’s true. If it ever becomes true.

CL: You’ve both taught digital humanities in the undergraduate classroom. You’ve both worked with undergraduate interns on digital humanities projects. I’d love to hear about  teaching CS students and teaching humanities students, and whether there’s a difference and how you navigate the interdisciplinary nature of the field in what is often a mixed classroom.

NN: One of the recent developments in DH at Stanford is that the data science minor listed Literary Text Mining, an introductory digital humanities course taught out of the English department, as one of its possible courses. So we’ve had an influx of students from data science, which is exciting, but does present pedagogical challenges. The way many of us have been taught or were trained to teach the digital humanities is, you take the humanities and then you add some weird counting on top of it, and all of that weird counting makes your existing humanistic knowledge stronger. Usually in a lot of the pedagogy work I’ve seen, we do work from the assumption that our students are humanists first, and we’re adding these kind of strange methods on top, and these strange methods make us all better humanists, because after you count characters, you have a better understanding of what character is. But it does presuppose an implicit, if not explicit, definition of character that you’ve acquired over your coursework in the humanities. When we have mixed classrooms, the pedagogical responsibilities there are significantly increased because it means I do have to give a definition of character before we can count characters. I’m not working to surface an implicit and unstated definition of character that I assume all my students have because they’re humanists. Now, I have to be like, okay, well, here’s what a character is, and now let’s try to count it. And I think that with time and space that that is a really wonderful and productive exercise for everyone involved. I don’t think it weakens our pedagogy, but it does take more time, and it takes a lot more attention.

It means we have to be really careful, I think, about asserting our learning outcomes and making sure we’re assessing student knowledge coming in. But the interdisciplinarity in the classroom is also a fantastic resource. You just have to be really careful about not making assumptions about what your students know, and being really explicit about everything you’re doing and why you’re doing it. And making sure everyone feels comfortable asking for help or for clarification. It does take much more in terms of resources, in terms of time, in terms of training, that I think our field needs to invest in if we’re going to be working from this kind of meet-in-the-middle angle. The other model would be to have prerequisites and to have separate tracks for the humanists and the CS students. And I don’t know that that’s something we want, but it might responsible in a different way.

CL: That’s an interesting idea, I’d never thought of having different prerequisites for humanities students and CS students.

NN: I’ve definitely seen ads for classes like, “Spreadsheets for Humanists.” And that makes sense, because we know that your background knowledge matters when you come into a learning space. It does. The things you’re going to assume are going to be different. So if it’s useful to assert that this is a spreadsheets class for humanists, I think that the corollary is true. It might be useful to assert that a given class could be a humanities class for CS students, but that’s just going to require a lot of research into what those students actually know and bring into the classroom with them. It’s going to require collaboration across disciplines in a way that takes a lot of institutional investment.

MW: This question gets at timeworn debates in the digital humanities that have often been framed in terms of the value and necessity of being able to code. I think that that framing is a little bit narrow. But broadly construed, what is the place of technical skills and technical expertise in the digital humanities? It’s a sort of vexed question because the digital humanities did not start in the classroom, right? The digital humanities started as a research methodology. And digital humanities research can employ digital tools and engage digital content in all kinds of ways. So when it comes to mapping that variety onto the classroom, the question is, if the addition of a digital humanities class to the curriculum is meant to introduce a technical component or technical skills of some sort, what are those skills, exactly?

This gets weird when, oftentimes, digital humanities classes try to teach a little bit of coding, but that’s a sort of odd thing to do in a humanities classroom for a lot of reasons, not least of which is that there’s a department over there that does that, and does it very well. So why are we doing that? Maybe the thing that we should actually be doing is saying, look, you’re a humanist, and you want to learn to count words at scale or extract place names from novels, go take some CS classes and come back. That’s not unreasonable to propose to a graduate student who has a research project that they’re conceiving of on a longer timeline. Spending six months acquiring skills can be more challenging as something you would add to an undergraduate English major.

NN: I just want to double down on something Matt said, which is the idea that you need to know how to code to do DH, which is a foundational debate to the formation of our discipline as such. I would come down very hard on the side of actually what we do is counting, or at least that counting is possible without coding. The idea that we turn things into ones and zeros is the coolest thing that we do. The thing that brings me together with a librarian creating metadata for a collection of maps is that we’re both turning things into ones and zeros. And you really don’t need to code to do that. You don’t need a computer to do that. I see our work as the really weird move of translating things into numbers and back again, and that can be taught without coding. There is a low tech, or a no tech version of the digital humanities that I think we could really lean into and find some cool, unifying disciplinary concepts. But a lot of the hype and a lot of the funds and a lot of the exciting new research happens in the computational space. And so many of us practice in that space for our research, but there are plenty of other kinds of digital work that we could make the center of a digital humanities curriculum that doesn’t look interdisciplinary. We can imagine a digital humanities that is not interdisciplinary, at least in terms of infrastructure needs and majors and minors and all of that. And it’s one that eschews coding and tries to find something more essential and digital and make it our own.

CL: Sort of like an aggressive turn away from the big tent metaphor in favor of the idea that there’s some underlying conceptual thing that unifies us, and it’s pretty simple.

NN: Right. Though I don’t think that most people who work in digitization, say, putting archives online, are going to be as excited as I am about the fact that what they’re working with are pixels, right? But I do think that there is a misconception that we all need to code. I would echo Matt’s argument here that coding and computation and all of these things are maybe ways to make digitization or working with the digital easier and faster. But they’re not the core of what we do.

MW: There’s also just a very pragmatic question, which is that, whatever our theoretical investments, the conversion of humanistic information into binary data, but also the digital humanities' interest in labeling things, in extracting place names, or character names for that matter, in the datafication of text to see what that kind of reductive maneuver can surface. What can that help us see, that we wouldn’t have seen otherwise. Both Nichole and I have taught classes where we have had students create datasets by hand. But I do think the research value of this method emerges in doing it at scale. And so maybe there’s a pragmatic view: the case for learning to code as a researcher feels like it’s mostly pragmatic, not theoretical. It’s not that you have to. But unless you’re blessed with collaborators, knowing how to code yourself is invaluable for producing results at scale.

CL: I want to put the question of the affordances of scale to you Nichole, who I know has thoughts on that. But, I wonder if you would agree or disagree, and forgive me if I’m mischaracterizing you, Matt, that one of the big things that digital humanities brings to the table is scale.

NN: I think that computational literary studies absolutely affords work at scales that would be really difficult for many of us to produce by hand. And I would draw a distinction between computation and digital thinking. I think Matt is right, and that, like I said earlier, there’s money, there’s hype, there’s excitement about doing these big picture projects. And I do plenty of them myself. But I also do really, really tiny projects. And I’m not changing my core principles. If you take a book and you highlight every pronoun in it by hand, and then you count them, or just notice them, and then you write a close reading of that book, you’re doing something computational, right? You’re just working at a different scale, you’re using your own working memory, but you’re augmenting it in the same way you would produce data to feed to a large model. Scale is something the field is going to have to think more carefully about going forward. We’ve avoided talking about Large Language Models until now.

CL: Please take us there, I was going to ask.

NN: Scale is on people’s minds right now because of Large Language Models (LLMs). People in text mining are going to have to have more careful about their claims about scale, claims that we probably should have been more careful about all along. Quite practically, things like copyright change our conversations about scale, you know, what we can do on massive corpora of out of copyright text is just different from what you can do if you’re a researcher who has to scan every book that you use on your own and OCR it by yourself. I actually think we’ll find some cool things when people start working at a scale that is more constrained. I think that one of the things that the current scrutiny on LLMs is going to push the field to do is that people are going to think very carefully about corpus building and maybe we’ll see some really cool, very tiny DH on tiny, personally digitized corpora.

CL: I was going to ask this question in terms of a house style, which is something I’ve been interested in since being a part of the Literary Lab myself. And I know that both of you have experience with these international collaborations with other digital humanities research groups where projects can look very different. So I’m curious to hear about how what we do at the Literary Lab or at Stanford might have its own distinct flavor. And then also in terms of Stanford, do you feel the Silicon Valley culture seeping into the research in any way? This could be in terms of LLMs and AI, or maybe through the undergraduate population and how students approach research, or anything else.

MW: One thing to keep in mind in terms of digital humanities at Stanford is that, at this point, the digital humanities centers at a lot of Stanford’s peer institutions are actually vastly larger and better resourced.

NN: And differently resourced.

MW: Yeah. Look at the projects and research that come out of a place like Princeton. A lot of that work is at a different scale, in terms of institutional resources. The Center for Digital Humanities at Princeton has a staff of professional programmers. Actually, we know a couple of people from the Literary Lab who now have jobs working for these kinds of digital humanities centers, as people who combine technical and subject matter expertise and are ultimately working on other people’s projects. There are no full time employees at Stanford who have the job of helping with digital humanities research projects on the ground, participating in the research in that way. Which means that projects are scaled differently, but they also come about in a less centralized way, right? They come from individual research interests, whether that’s a graduate student’s dissertation or that’s a faculty research project. We’ve all had the experience of interacting with a student, undergraduate or graduate, who’s been sucked into a faculty research project that is clearly scaled off of a model of an institution that has a different kind of support for that sort of project. One of the easy examples is people who want to do big, digital web presence projects. Even when the thing that they ultimately want to put on the web isn’t that challenging, there just isn’t really support at Stanford for a digital project that has an ongoing life on the web. The library will step in and can provide a lot of these services, but there’s no dedicated staff for digital humanities web presence in the way that there are at other places. So that limits and constrains things. More support for the digital humanities at Stanford would make a lot of people’s lives easier and would enable a different kind of research.

CL: It feels so counterintuitive that the digital humanities at Stanford has less support than peer institutions, especially with the Silicon Valley connection, and because we have quite a long history of digital humanities research compared to some other institutions.

MW: The question of the influence of Silicon Valley is really interesting and really hard to map out. One of the interesting features of Stanford life is seeing generations of graduate students move to this place. I think it’s somewhat different for undergraduates because school is, for them, a slightly more liminal space. But you see generations of graduate students move to Palo Alto or somewhere nearby and respond to the culture of Silicon Valley, and the different ways that people absorb and respond to that culture can just be fascinating. But, on the other hand, it’s hard for me to see a connection really between the Literary Lab and what’s happening at Google, for example. I talked briefly with a researcher from Google who was at a CESTA event recently and I felt like we had absolutely nothing in common. I talk to a lot of scientists at Sanford on a regular basis. Most of my friends at Stanford are scientists, and I feel like I have something in common with them in a way that I do not with researchers working in tech. And that’s interesting because I’m not exactly sure what this person’s field of specialty was, but it seemed like it was natural language processing or something in that area, and I talk to biologists with whom I feel like I have more in common. So it might just be that the corporate/academic divide is thicker than we think it is.

NN: One persistent point of contact for us, though, is teaching, because insofar as there are CS majors in our classes now, we are teaching undergrads who are constantly crossing that divide by doing internships over the summer. The tech world is very much a part of their lives. This is one of the regions where classroom practice, the things I learn from my students, the kind of pedagogical content knowledge I’m building in the digital humanities is probably influenced by tech in ways that I can’t see, because it’s mediated through what my students are learning in their internships and bringing back to the classroom or by my students career goals beyond the world of research. For me, the classroom is definitely the place where that Stanford tech culture that everyone talks about is most visible to us.

MW: True, I took CS106a because I felt like it would help me to understand the perspective of my undergraduates, who had all taken CS106a and were showing up in my class going, “What’s Shakespeare?” Or maybe that’s putting too strongly, but certainly English is an outsider culture for many students at Stanford. And I felt like it would be interesting to understand the undergraduate culture from which they were coming. And what I found out is that class has like 25% graduate students sitting in the back.

CL: That's so true. That was me in 2020. As a way of wrapping up, I want to ask you both what you’re excited about in terms of where the field is going and what misconceptions you still encounter.

MW: I think that large language models are going to change the digital humanities enormously in two ways. Nicole pointed to the discussion of copyright and text databases and things like that. But I think another really important factor is that the digital humanities is ultimately full of people who have very cobbled together technical expertise, and the power of generative language models for coding for somebody who has some technical expertise is incredible. It’s easy to find people out there now who are producing things with the aid of Copilot. I think of myself as a fairly technically proficient programmer, and there are plenty of things that I would not be able to do on my own or that would take me a week to do that people with just enough technical expertise to formulate a prompt that returns what they’re looking for can now just sit down and do. You can say like, okay, this line is working, or this isn’t doing what I want. The digital humanities is uniquely full of people who are in a perfect skill spot for that.

A lot of what we do isn’t fundamentally difficult for people with a really strong CS background. But most of us don’t have really strong CS backgrounds. We often have enough experience to take the code from an old project and reconfigure it. That’s the kind of thing that AI models are quite good at. So I think we’re going to see a big erosion of some of the technical inequalities in the digital humanities between the people who can do technically complicated things and the people who don’t have those skills. There are a lot of digital humanists who are deeply in tune with concerns about large language models and their training data and the ethics of the uses to which they’re being put and all these things. And I do think that feedback is going to change the way the models are built going forward, whether that’s Copilot or ChatGPT, or whether that’s something else. Hopefully in the future we’ll start to see models that are a little bit more open source, that feel a little bit more thoughtful and responsible. That’s coming, I think, and that’s going to change things in a significant way.

NN: For me, the thing I’m most excited about in terms of where the field is going is DH pedagogy. The explosion of grants and concern about large language models in K-12 is an opportunity for pedagogy to start thinking quite seriously about K-12 spaces. I’m not saying that we need ChatGPT in every high school classroom. I’m saying that the conversation has been started by our anxieties about ChatGPT in the high school classroom, which has brought digital humanities into pedagogy spaces that I don’t think we would have been invited into five years ago. The same is true and moving faster and more obviously in undergraduate education. Our anxieties about large language models are going to provide an opportunity to think really deeply about pedagogy, and what it is that pedagogy can do, what it should do, whether teaching a certain kind of digital literacy is its own thing, or part of our English literacy responsibilities. All of those questions I’m really excited about. I’m nervous, of course, but I’m excited about them. I think that it’s a cool moment for the field to be where it is.

In terms of misconceptions, I think we’ve circled around this a few times, and I’ve even made interjections to this effect, and it sounds a little pedantic, but I think the digital/non digital binary and the use of the term “analog” to describe traditional humanistic work is something that could bear a little more scrutiny. Doing things by hand is not necessarily analog. And I think we could productively decouple computation and coding, counting and coding. There is a misconception that everything I do as a digital humanist involves a computer. That is probably the misconception I fight against the most. I can do a lot of really cool digital humanities stuff by hand on a piece of paper with old school stem and leaf plots and that’s awesome. I love doing that. And it doesn’t require a lot of money or all of these other things. Some people certainly do digital humanities in big, expensive ways and good for them. But we don’t have to.

MW: This gets to another major misconception I’ve encountered. A lot of digital humanities work is collaborative. There are lots of reasons for that. It’s partly that very often there’s a big tent aspect to the technical skills. And since no one has an undergraduate, or at least no one in our generation has an undergraduate degree in the digital humanities, no one person has all of the skills. Collaboration is a way to bring those skills together to work on one project. One thing that can be really frustrating with the discourse around the digital humanities is that there can be this misconception about how funding is spent. You look at the funding that digital humanities research projects get and the funding goes to pay people at the end of the day. Sure there are projects that have fancy computational needs but a lot of time the funding goes to a graduate student who needs funding for another year because they’ve been sucked into a bunch of digital humanities research projects, or to pay an undergraduate RA or to give someone summer funding in return for their labor on this project. There’s a tendency to assume we’re building some kind of digital humanities particle accelerator and the money is going to space lasers or something. In reality, the money is usually going to a student.

Whether or not the digital humanities has to be construed in terms of these large funded projects is hard to say. But so long as we aren’t training people in graduate school with the background that they would need to do these projects alone, we’re going to need collaborators, and that means more paid research positions. I think that scale is really interesting to digital humanists, and people are always going to be drawn to projects that are too large for one person. So this is a practice or a research model that’s going to continue for a long time. And it’s worth putting the human researchers at the center of it. The focus should ultimately be on them, that’s where the grant money goes. It can be frustrating to talk to people who think that I have a room full of super computers somewhere. If I did, there are things I would be doing with them, but I don’t.

NN: I want to add on here, I would like to highlight what Matt says about projects that are appealing because they’re too big for one person. I think a lot of existing conversations about scale in the digital humanities have a single person at the heart of them, that it would be impossible for one researcher to read all of these novels, but it’s equally impossible for a hundred researchers to read a million novels. There’s something special and cool about the fact that we can think about projects that are too big for one person. Sure, in terms of the time it would take to read all of the books, but also in terms of disciplinary expertise. That is a really valuable reframing. And I think we should lean on that. It’s a useful way of thinking about what the humanities are for. The goal is to share and contribute, and we should underline that. 

My Colloquies are shareables: Curate personal collections of blog posts, book chapters, videos, and journal articles and share them with colleagues, students, and friends.

My Colloquies are open-ended: Develop a Colloquy into a course reader, use a Colloquy as a research guide, or invite participants to join you in a conversation around a Colloquy topic.

My Colloquies are evolving: Once you have created a Colloquy, you can continue adding to it as you browse Arcade.