Peter B. Kaufman is Associate Director of Development at MIT Open Learning. He is the author of The New Enlightenment and the Fight to Free Knowledge (Seven Stories Press, 2021) and founder of Intelligent Television, a video production company that works with cultural and educational institutions around the world.
In this interview, he discusses his latest book, The Moving Image: A User’s Manual (MIT Press, 2025), with Shop Talk editor Russell Harper.
RH: Thank you for joining me today. In your new book, you call on academic publishers and other education-minded organizations to do more to preserve and annotate the growing mass of audiovisual content that’s been accumulating since the 1890s, or a decade before The Chicago Manual of Style was first published as a guide in support of styling and annotating text. You also make the case that the age of print is coming to an end and that our centuries-old habits of scholarship need to be updated to use video as well as text as a primary means of publishing and citing the results of research. Is that a fair assessment?
PBK: Yes, and it’s also a call for our knowledge institutions to think of audiovisual formats as we produce and share exponentially more of what we know. The book is an effort to alert everyone, especially in the academy, to the fact that more and more people are getting their news and information through video and audio and screen-based media, and that this trend won’t stop. I also write about reports describing dis- and misinformation as the number one threat to international society today—ahead of war, poverty, climate change, and disease. Print has been developing citation and authentication methods for six hundred years now, as you know better than anyone. It’s time we develop similar systems for the media that are going to replace it.
RH: OK, so allow me to play the role of devil’s advocate, if only to highlight the continued value of text in a world where, as you say on the first page of your book, “Video, quietly, almost elusively, has become the dominant medium of human communication.” It’s not that I disagree that this has happened. I’ve witnessed it firsthand, and I agree that it would be great if a lot more video could be made just as easy to locate and search and cite as any page in a book. But I’m a fan of text for its own sake. For example, I usually prefer an edited transcript to a video of people talking. The transcript leaves me free to consider the meaning behind the words at my own pace, without the distraction of how people talk and how they look, and without being influenced by music or ads. I feel differently about movies and TV, but that’s art. Still, if video is the “new vernacular” as you say more than once in your book (as on pages 3 and 43)—having displaced a lot of the text that people like me still spend time reading—why did you write a book instead of making a video?
PBK: I wanted to get your attention.
RH: Your ploy worked, though it may take me a bit more time to accept video as an authority on the level of text. For example, you point to the Zapruder film of the JFK assassination as well as the videos from January 6, 2021, and other events captured in the smartphone era as case studies for the value of audiovisual evidence and the need to safeguard such evidence and make it available to anyone who wants to examine it. But people can watch the same video and come to entirely different conclusions. Isn’t it ultimately up to writers, reporters, and researchers to investigate such events and find out what happened and to report what they’ve found in as much detail—and with as many words—as needed, while citing all relevant evidence, including video? To put it another way, isn’t it still through language, whether spoken or written, that we learn the truth about our world, regardless of any audiovisual record?
PBK: Absolutely. Look, eight- or nine-tenths of Americans believe in angels, and most of the people in the world have some kind of faith in the idea that supernatural forces govern our destiny. So there are lots of epistemological challenges out there for us to consider. My focus is narrower, inspired by what journalist Hanna Rosin calls our “epistemic chasm of cuckoo.”* If you say something today, and then tomorrow say you never said it, we need some uniform ways of citing the audiovisual record of your having said it. This is especially urgent if you keep denying having said it, or if you are in government, or if you are meant to be responsible for people’s lives and well-being—or all of the above. We need a common and agreed-upon apparatus for anchoring truths and facts that we consume and distribute on screens. A cognate, in other words, for what the Chicago Manual publishes and has done for more than a century.
RH: A Manual of Style, then, that provides guidance and support for a future world in which audiovisual content has become just as easy to find, assess, search, quote/clip, and cite as text is now, in part because we’ll need to use video to counter false narratives. Which makes sense if, as you note on pages 218 and 219 of your book, “a third of all US adults under age thirty get their news through TikTok” (you cite a Pew Research Center report from November 2023 for this info). But doesn’t a lot of the “news” on TikTok consist of people filming themselves giving their opinions? And doesn’t TikTok have a vested interest in holding our attention? In other words, how much does video of the news factor into the popularity of TikTok relative to, say, the entertainment value of watching people perform for us?
PBK: Right. I’m not sure what you are watching, but I can see a lot of that, too. Still, clips from news shows and press conferences and music and sports and all kinds of other things, besides, zing around the internet at dizzying speed. TikTok, Instagram, X, YouTube, Bluesky—the list of platforms and apps is really, really long—and important stuff circulates there as much as anywhere. I used to teach documentary film in the summers to high schoolers, and students from around the world—China, Taiwan, Japan, South Korea—showed me video-sharing apps that may be even more popular than those! Sure, you can find video everywhere of people recording each other at concerts, or sharing the best double plays in baseball and great windmill dunks at Madison Square Garden, but there are also clips of leaders lying and federal agents dragging civilians into unmarked cars.

Many of the illustrations in The Moving Image feature still images accompanied by QR codes like these, which provide a bridge from printed page to video (or, as here, from screen to video). QR codes from left to right: Jawed Karim, “Me at the Zoo,” YouTube’s first video, April 23, 2005, 0:19; Ada Limón reading her poem “The End of Poetry,” Library of America, November 14, 2023, YouTube, 1:55; Gilbert Strang, “Lecture 1: The Geometry of Linear Equations,” MIT OpenCourseWare, Fall 1999, 39:48; “In the Event of Moon Disaster,” deepfake video, MIT Center for Advanced Virtuality, July 20, 2020, YouTube, 7:46.
RH: Real content, in other words, shares space with entertainment. But hasn’t the growth of online video been accompanied by—or even prefigured by—a shift from text on the page to text on the screen? This has happened not only in publishing but on social media, where the “text message” (a term that, in its current sense, dates to 1993) has evolved to replace the hand- and typewritten letters and notes that most of us past a certain age would have been sending to each other barely a generation ago.
PBK: Absolutely, text is moving from paper to screen now, and there are more screens in use every day. The Moving Image: A User’s Manual—this first edition, anyway (aren’t you guys on your 18th now?)—looks to provide encouragement and examples for how to cite audiovisual media wherever it is, including social media, and thus I think is part of a larger project that many of us are involved in to wend our way through and out of this epistemic chaos. To the extent that this chaos is accidentally created—with young users and influencers enthusiastic and passionate about social media, and millions of voices saying competing things—it’s a fun project. To the extent that the chaos is purposefully created—with corporate actors intensifying the addictive properties of their technologies, businesses lying about things they try to sell us, and political actors churning out falsehood after falsehood—it’s somewhat more urgent.
RH: Again, it sounds as if part of the challenge is the lack of boundaries—entertainment and advertising and news/information share the same “channels,” whether as text or video. You call on academic publishers, especially, to produce much more video—like the peer-reviewed video content at Nature and the lectures produced via OpenCourseWare at MIT. Do you think of such videos as being offered alongside the peer-reviewed articles and books that still dominate academic publishing? Or do you see videos replacing articles and books?
PBK: Both. “Replacing”—as in taking the place of—text and print won’t happen for centuries. If we say that movies ran for the first time around 1895, but we date print back to Gutenberg, then books and articles have had a four-hundred-year head start—and text before the codex even longer than that. There are libraries, university buildings, and archives around the world with names like Herodotus and Cervantes and Shakespeare etched on them in stone, whereas you don’t find buildings with the names of Claude Lanzmann or Martin Scorsese up on the porticos yet. But if you want to capture the attention of the generations coming now, in the classroom and in what you publish, you better figure out how to make the words speak and images move.
RH: What about cost and scale? To annotate a single episode of, say, All in the Family—identifying people on the screen and behind the scenes, naming the items on the set, filling in the historical context and technical details about production, citing related resources—would require a lot of research. And you’d need to build and maintain reliable platforms for adding all these annotations and all this metadata without compromising the integrity of the original content. Do you think generative AI will end up doing most of the work?
PBK: Have a look at Wikipedia entries on individual episodes of Family Guy or All in the Family—the knowledge is out there, and the human energy is there, too. AI, harnessed for good, can help us almost immeasurably. We have to figure out ways of watching the material in our audiovisual archives, and no one lives long enough for us to do it in what they call real time. Imagine computers scanning and logging the contents of hours of film in a minute, though. That’s actually something nice to conjure with.
RH: I do see that most episodes of Family Guy have their own Wikipedia page; All in the Family hasn’t generated as much interest and might require the input of historians at this point. I’ll take either or both—hive mind or historian—though I’m sure you’re right that AI will be needed simply to help us digest everything. I can see dedicating all this time and processing power to TV shows and movies (even the bad ones)—and anything else that was made to last—but I’m not so sure about a lot of the other stuff out there. Where would you draw the line when choosing what to save?
PBK: Probably right before you propose doing a video version of this interview! More seriously, the main thing to stress is not where to stop but where to start. There’s a treasury from all around the world of film, television, radio, and all kinds of audio and video featuring performances, experiments, human achievements and failures large and small—that’s meant for all of us to use. Most of it isn’t digital—and the models undergirding AI today haven’t ingested but a tiny piece of it all.
RH: There’s so much in your book, too, that we haven’t touched on here, from historical and technical details to challenges related to copyright (including a proposal for improving Creative Commons licenses) and more. Anyone who cares about the future of publishing—and the integrity of our collective historical record, both written and audiovisual—will want to take the time to read your book, right down to the last footnote (many of which are worth reading on their own).
PBK: Stand in the light, please—kindly say that to camera.
RH: I’d love to do that someday, if it means extending this conversation. Thanks so much for your time and for your thoughts on The Moving Image.
PBK: Such a pleasure. Thank you.
* Hanna Rosin, “The Insurrectionists Next Door,” The Atlantic, September 16, 2024.
~ ~ ~
Russell Harper is the editor of The Chicago Manual of Style Online Q&A and was the principal reviser of the 16th, 17th, and 18th editions of The Chicago Manual of Style. He also contributed to the 8th and 9th editions of Kate L. Turabian’s A Manual for Writers of Research Papers, Theses, and Dissertations.
Please see our commenting policy.



