A project of the Data Science Across the College initiative, directed by Timothy Beal and Mark Turner and supported by President (now Emerita) Barbara Snyder, to build on strengths and potentials across many fields, including the arts and humanities, social and behavioral sciences, and natural and mathematical sciences, to elevate data science and machine learning as a college-wide focus for research, scholarship, and curriculum development.

The Fall 2020 and Spring 2021 colloquium series is run in conjunction with h.lab and takes place on occasional Wednesdays, 3-4pm Eastern Time.

Wednesday 3-4pm Eastern. 28 October 2020

A live recorded zoom session.  Ask turner@case.edu for an invitation.

Speaker: Micah Saxton, Tufts University, & Michael Hemenway, AI Institute at Iliff School of Theology
Title: Getting Started with Coding for Humanities Scholars


Abstract: This workshop is an introduction to coding in Python for humanities research. We will start by showcasing a few kinds of Python projects and brainstorming about how principles or insights from these projects could be implemented in your own research. Following that, you will learn a few fundamentals to make you comfortable getting started with Python. Finally, we will provide resources you can use to learn Python on your own.


Wednesday 3-4pm Eastern. 4 November 2020

A live recorded zoom session.  Ask turner@case.edu for an invitation.

Speaker: Timothy Beal, Florence Harkness Professor of Religion.
Build a Bot: A DIY Toy that Makes You Think

Abstract: In this session you will learn how to build a text bot in the coding language of Python. We will first make on that autogenerates its own verses based on the King James Version Bible. You can then take it home and experiment with other textual corpuses (see, e.g., twitter.com/emilymarkovson which uses the program to write and tweet four-line poems based on the complete works of Emily Dickinson). As fun as it is to play with, this program also provokes questions and issues about writing, probability, and what we might call “artificial creativity.”


Wednesday 3-4pm Eastern. 18 November 2020

A live recorded zoom session.  Ask turner@case.edu for an invitation.

Justin Barber, AI Institute, Iliff School of Theology
Michael Hemenway, AI Institute, Iliff School of Theology
Title: Transformers: Analyzing Language with Deep Learning


Abstract: Deep learning has revolutionized how machines analyze, interpret, and generate language. Machines have indeed become important and accessible collaborators in the construction of meaning from language. Scholars and students in the humanities now have easy access to hundreds of cutting-edge deep learning models that can aid them in their work. Join us this week to acquaint yourself with some highly accessible programming tools that require very little coding, and consider with us how these tools might further and even transform our understanding and interpretation of human language.


Wednesday 3-4pm Eastern. 9 December 2020

A live recorded zoom session.  Ask turner@case.edu for an invitation.

Ken Singer, Ambrose Swasey Professor of Physics, with
Michael Hinczewski, Assistant Professor of Physics
Ina Martin, Senior Research Associate (Physics), Adjunct Faculty in the Department of Materials Science and Engineering
Betsy Bolman, Elsie B. Smith Professor in the Liberal Arts and Chair, Department of Art History and Art
Title: Data Science in Art: Discerning the Painter’s Hand

Abstract: The Departments of Art History and Art, Physics, Materials Science and Engineering, the Cleveland Museum of Art and the Cleveland Institute of Art have been collaborating to investigate the application of machine learning (ML) to artist attribution based on confocal optical profilometry data from student-produced painting via the brushstroke texture. A convolutional neural network was applied to classify the surface topography among several students’ paintings. By specifying the painted subject and materials for the students, we were able to carry out a controlled study of various ML approaches including the efficacy of transfer learning as well as scaling, normalization and other pre-training analyses. We were able to confidently attribute paintings among multiple hands, with potentially significant implications for the art historical field of connoisseurship. To this end, we are now collaborating with the internationally-renown Factum Arte, in a project to apply ML on surface topography in order to test the ability of our techniques to distinguish among the hands of El Greco, his son Jorge, and members of his workshop. Additional avenues of collaboration include applying optical methods for producing non-destructive cross-sections of art works and other art conservation techniques.

Wednesday 3-4pm Eastern. 10 February 2021

A live recorded zoom session, perhaps with a synchronous f2f meeting, depending on pandemic conditions.  Ask turner@case.edu for an invitation.

Speaker: Peter Uhrig
Title: Introduction to the Red Hen Lab Rapid Annotator

Abstract: Red Hen’s Rapid Annotator provides a platform to users to annotate large chunks of data in a short span of time and with the least possible effort. Text, speech, paintings, sculpture, video, music—any kind of digitized data—can be presented to annotators for quick annotation, creating metadata that can then be searched and analyzed with typical data science methods. This introduction is entry-level and will take participants through the steps for loading, annotating, and analyzing data.

Wednesday 3-4pm Eastern. 24 February 2021

A live recorded zoom session, perhaps with a synchronous f2f meeting, depending on pandemic conditions.  Ask turner@case.edu for an invitation.

Speakers: Wenyue Xi and Mark Turner
Title: Data Science for FrameNet and Frame Blends


Abstract: FrameNet is a project used to tag text, speech, and images for conceptual frames.  Turner will give a brief introduction to FrameNet and its use in both manual and automatic tagging.  He will then introduce the phenomenon of Frame Blending in language and other forms of multimodal communication. Turner was the mentor for Wenyue Xi’s project during Red Hen Lag’s Google Summer of Code 2020, in which she developed a pipeline for “AI Recognizers of Frame Blends, Especially in Conversations About the Future.” Xi will give an overview of her work, in which she developed multiple algorithms that nominate various types of Frame Blends and built a preliminary system of the Frame Embedding method. Xi’s work also provides an interactive prototype of the human-in-the-loop Frame Blends Nomination System (using Red Hen’s Rapid Annotator for manual annotation). Xi provides video tutorials and a design document. Both the process and result are documented on the Red Hen Lab Techne Public Site, making this project available as a public source for the Red Hen Lab community. It provides a starting point for developers or users who may participate in the future.

Wednesday 3-4pm Eastern. 10 March 2021

A live recorded zoom session, perhaps with a synchronous f2f meeting, depending on pandemic conditions.  Ask turner@case.edu for an invitation.

Speaker: Tiago Torrent, Federal University of Juiz de Fora – FrameNet Brasil
Fine-grained Semantic Representations for Multimodal Data Analysis

Abstract: TBD

Wednesday 3-4pm Eastern. 24 March 2021

A live recorded zoom session, perhaps with a synchronous f2f meeting, depending on pandemic conditions.  Ask turner@case.edu for an invitation.

Speaker: J. Elliott Casal, Research Scholar, Department of Cognitive Science, Case Western Reserve University
Title: Corpus-Based Genre Analysis in Writing Research and Pedagogy

Abstract: Research on discipline specific written genre-practices and related genre-based writing pedagogies have increasingly integrated corpus-based approaches to linguistic form with rhetorical approaches to writers’ functional aims. Such research draws on developments in genre theory, usage-based approaches to language learning, and at times Sociocultural Theory to analyze and teach linguistic resources in terms of their functional affordances for situated communicative practices, often with an emphasis on discipline specific academic genre practices. This talk outlines the emerging body of corpus-based genre analysis research and related pedagogical intervention studies targeting novice disciplinary writers. In doing so, I use my doctoral dissertation (Casal, 2020) as an illustrative example of both corpus-based genre analysis research and corpus- and genre-based writing pedagogy. Emphasis is placed on the genre analysis, in which a set of formal linguistic features (reporting verbs, shell nouns, formulaic phrase-frames, and select measures of syntactically complex structures) were analyzed using a series of manual processes, custom python scripts, and automated NLP/corpus tools across the rhetorical stages (operationalized as rhetorical moves) of 400 published research article introductions from two social science disciplines and two engineering disciplines. The talk will also briefly outline a related corpus- and concept-based pedagogical intervention which was carried out in a doctoral academic writing course.

Wednesday 3-4pm Eastern. 14 April 2021

A live recorded zoom session, perhaps with a synchronous f2f meeting, depending on pandemic conditions.  Ask turner@case.edu for an invitation.

Speaker: Cristóbal Pagán Cánovas, Ramón y Cajal Assistant Research Professor, Department of English Philology, University of Murcia; Alexander von Humboldt Fellow, Quantitative Linguistics, University of Tübingen.
Title: Machine learning for poetic creativity in oral traditional performance

Abstract: How do we learn to organize a language in chunks and to use those chunks creatively? Theories of chunking are based on abstract rules or on the storage of large numbers of exemplars. They view linguistic knowledge as a linear combinations of discrete ‘chunks,’ such as phonemes or morphemes. The Parry-Lord theory of oral composition-in-performance argued that oral singers produce complex poems out of rehearsed improvisation through the mastery of a system of formulas, chunks that integrate phrasal, metrical, and semantic structures. Recently, computational linguistic models (Baayen et al.) based on discriminative learning propose that linguistic knowledge consists of statistical expectations within the complex dynamic system of cues and outcomes underlying language. Instead of discrete units, these computational models use a ‘wide’ learning algorithm with thousands of input units representing summaries of changes in acoustic frequency bands, and with proxies for distinctions in a lexical meaning vector space as output units. In this talk, I will reconsider formulaicity and creativity in oral poetic performance through these non-compositional models.

Previous colloquia

Thursday, 16 April 2020, 4-5 pm

A zoom webinar.  Ask turner@case.edu for an invitation.

Speaker: Erkki Somersalo
Professor, Department of Mathematics, Applied Mathematics, and Statistics.
Case Western Reserve University.
Title: Self-organizing map and imaging: New applications for an old method

View Recording

Self-organizing map (SOM) is a classic example of early artificial neural network algorithms for dimension reduction and organizing, visualizing and analyzing data. The algorithm was proposed and developed by Teuvo Kohonen in the 1980s, and is sometimes referred to as Kohonen map. The algorithm is heuristic, drawing ideas from organization of neurons to perform special tasks, and, e.g., Hebbian learning models. It also constitutes a basis for certain classification algorithms based of feature vectors such as learning vector quantifier (LVQ). In this talk, the basic idea of SOM is explained, with the emphasis on the intuitive side of the approach related to the geometric vs. topological organization of data, and the algorithm is then applied to certain high dimensional imaging data, in particular, the hyperspectral imaging in remote sensing as well as texture analysis.

Thursday, 9 April 2020, 4-5 pm

A zoom webinar.  Ask turner@case.edu for an invitation.

Speakers: Members of the AI Institute at Iliff School of Theology
Title: ai.iliff – conversational ai for online learning

View Recording

Abstract: ai.iliff is a Henry Luce Foundation funded AI institute housed within the Iliff School of Theology in Denver, CO. ai.iliff developed out of several years of experience learning with machines as partners in the process of scholarship and research in the humanities. Building upon the recent advances in pre-trained language models for NLP tasks, we are using our TRUST model for ai design to build conversational  ai applications to enhance student learning in online education.

Thursday, 19 March 2020, 4-5 pm
A zoom webinar.  Ask turner@case.edu for an invitation.

Speaker: Peter Whitehouse Peter J. Whitehouse MD-PhD has a primary appointment as Professor of Neurology, with secondary positions as Professor of Psychiatry, Cognitive Science, Neuroscience, and Organizational Behavior, and former appointments (but current interests) in Psychology, Bioethics, History, and Nursing at Case Western Reserve University. He is also currently Professor of Medicine at the University of Toronto, Honorary Research Fellow (zoology and aging) University of Oxford, and Founding President of Intergenerational Schools International. He is a card-carrying transdisciplinarian of the French variety. Major focuses of his current work have been age-associated cognitive challenges (formerly Alzheimer’s disease) and the nature of evidence and evidence of nature.
Title: If Big Data is the answer to Alzheimer’s, what is the question?

View Recording

Abstract: “Alzheimer’s” is for many a dominant individual and social concern. How do we gather and analyze evidence wisely to understand the phenomenology of aging associated cognitive challenges and help people and communities suffering from the condition? Examined transdisciplinarily what is the nature of evidence? Almost hypothesis-less Big Data is said to be the answer to “curing” Alzheimer’s. But as always, framing the questions and examining the words in them are the best places to start finding helpful answers. How does Alzheimer’s-type dementia relate to aging? Is it one condition? Is preserving brain health more about molecules and genes or communities and politics? What is the evidence for evidence and whose version anyway? What is the story of Big Data in the biopolitics of dementia? How do we create not AI but new symbionic/symbiotic intelligences? What is the bigger story, even grander narrative, we need to tell about the brain and aging in the emergent Anthropocene? Deconstructing the narrative of Alzheimer’s can be part of understanding our collective great derangement in failing to address the collapse of ecosystems and hence also potentially modern civilization? More importantly, how to do we use data, narrative, and metaphor together to create not only knowledge but wisdom to reinvent ourselves and our societies to be more resilient and sustainable?

Thursday, 5 March 2020, 4-5 pm
A13 Crawford Hall

Speaker: Jing Li, Leonard Case Jr. Professor in Engineering and Interim Chair Department of Computer and Data Sciences
Moderator: Tim Beal
Title: Heterogeneous Network Analysis for Computational Drug Prediction

View Recording

Abstract: Heterogeneous networks have been widely used in modeling real-world complex systems and have been a powerful tool in studying complex biological problems. Link prediction in heterogeneous networks is one of the key computational problems. Efficient and effective algorithms for link prediction in heterogeneous networks are in great need. Furthermore, large scale network based integrative analyses that use multiple data sources have been a promising strategy for many applications in computational biology such as computational drug prediction. A key challenge in integrating multiple data sources is the lack of an extendable system that can effectively handle missing data from multiple sources. In this talk, I will go over some of our recent work in computational drug predictions based on multiple data sources. Many of the problems can be casted as missing link prediction problems on heterogeneous networks. Various approaches including random walk with restart, joint matrix-matrix decomposition, and joint tensor-matrix decomposition will be discussed.

Thursday, 27 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Jennifer Hinnell is a Killam Laureate and PhD candidate in the Department of Linguistics at the University of Alberta under Dr. Sally Rice. Her research spans cognitive and corpus linguistics, multimodality, and gesture studies.
Title: Language in the body: Quantifying the multimodal signal in spontaneous discourse

View Recording

Abstract: Cognitive linguists have long acknowledged the role of embodiment and interaction on both the structure and meaning of language. However, until recently, movements of the body that accompany face-to-face interaction have not been included in the analysis of linguistic expressions. In my research, I investigate the role of the body as a critical part of linguistic meaning-making. I use the Red Hen archive, an international multimedia database of broadcast media, as well as motion capture data, to examine language use across a range of expressions in specific linguistic and conceptual domains. In addition to investigating the linguistic features involved in a particular utterance, I explore the usage patterns in the manual gestures, head movements, shoulder shrugs, postural shifts, eye-gaze, and brow movements. Using corpus annotation methodologies and quantitative and statistical analysis, I capture a range of linguistic and kinesic usage patterns that speakers produce with particular utterances. The research provides evidence for the coordinated and recurrent bodily enactment of grammatical and discourse-level expressions and addresses issues at the centre of multimodal research, such as the degree of convention and, by the same token, the degree of variation inherent in the kinesic signal. The research has implications for language documentation and description, especially of languages that rely predominantly on verbal and visual signals (e.g. signed and oral Indigenous languages). It also has applications in multimedia technologies, e.g. in virtual agents that rely on human-like language use and animated dialogue in films and video games.

Thursday, 20 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Weihong Guo.  Associate Professor, Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University
Title: High resolution image reconstruction and feature extraction

View Recording

Abstract: Images in two and higher dimensions are present in our daily life, ranging from pictures taken by smartphones to those obtained from medical devices. Recent developments in science and technology have caused a revolution in the generation, acquisition, analysis, processing, and visualization of images. Take medical imaging as an example. A variety of imaging modalities, e.g., computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), provide great potential to facilitate diagnosis and treatment. On the other hand, imaging problems, including reconstruction, enhancement, segmentation, and registration, are vital to many areas of science, medicine, and engineering. However, as the explosive growth of data, traditional methods in imaging face a lot of challenges, e.g., how to reconstruct high quality/resolution images from indirect raw machine measurements, how to effectively enhance image quality, and how to extract features of interest from images. Appropriate models and efficient computational algorithms play a crucial role in imaging performance. I will present some overview of related courses we offer in the Mathematics, Applied Mathematics and Statistics  department and my recent research results in these directions.

The results are based on collaboration with Yue Zhang (a former PhD student, now in Siemens Corporate Research),Professors Liang-Jian Deng (a former visiting PhD student, now in UESTC, China),  Jocelyn Chanussot (Grenoble Institute of Technology, Italy), Ke Chen (Liverpool, UK) and Liam Burrows (Liverpool, UK).

Thursday, 13 February 2020, 3:30-4:30 pm
Freedman Center, Kelvin Smith Library
Meet and Greet: Love Data Week
Meet for casual discussions, with refreshments. Members of the Library team for Data Science and the Freedman Center for Digital Scholarship will attend.
Registration required. Click to register!

Thursday, 6 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Daniela Calvetti, The James Wood Williamson Professor of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University
Title: Meditating Brain: Analysis and Interpretation of MEG data during meditation

View Recording

In this talk, we describe the ongoing research project to understand the fine scale spatio-temporal changes in the brain activity during meditation. The raw data consist of hours of Magnetoencephalography (MEG) recordings of professional meditators from the Theravada Buddhist tradition. The data was registered with one millisecond time resolution, the meditators alternating between focussed attention (Samatha), open monitoring, or mindfulness (Vipassana), and eyes closed resting state. The data is first processed to obtain a time series activity map of the brain, and subsequently interpreted by using data driven model reduction techniques. The work is part of a collaboration with researchers at CWRU (Daniela Calvetti, Erkki Somersalo, Brian Johnson (currently at Yelp) and University of Rome “La Sapienza”, Italy (Annalisa Pascarella, Francesca Pitolli, Barbara Vantaggi).

Thursday, 30 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Tim Beal, Florence Harkness Professor of Religion, with Justin Barber and Michael Hemenway, AI Institute at Iliff School of Theology
Title: Face of the Deep: Perverse Engagements with Neural Machine Translation

View Recording

Abstract: The broad aim of this project is to explore new possibilities for translation within a post-print digital media environment. Working with emerging technologies of neural machine translation (NMT) and natural language processing (NLP) in the programming language of Python, we are exploring new models and methods for translating Hebrew biblical and other ancient texts. Whereas print translation pushes the translator toward closure, deciding on a single translation and relegating alternatives to footnotes or parentheses, how might new media technologies make it possible to provide readers/users access to the processes of translation, hosting an encounter that attends to the rich ambiguities and polyvocalities of the other text in translation? How might we deploy new media technologies in ways that radically alter translation, not only transforming the processes of translation but also involving users in those processes?

Our interests as humanities scholars in ambiguity, polyvocality, and the irreducible otherness of the text in translation fly in the face of the burgeoning industry of NMT (e.g., Google Translate). Whereas the consumer-oriented goal of NMT is to erase ambiguity and make the processes of translation invisible and immediate (so users barely realize translation is taking place), we aim to build models using NMT and other NLP tools perversely, to slow down and make visible the complex processes of translation, in order to invite users to participate in those processes.

Thursday, 23 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Roger French, Kyocera Professor, Materials Science & Engineering; Director, SDLE Research Center; Faculty Director, Applied Data Science Program
Title: Data Science and Machine Learning Applied to Silicon Photovoltaic Solar Panels: Doing Energy Science at Scale with Time-series and Image Datasets

View Recording

This talk will begin with a review of the Undergraduate Minor in Applied Data Science, which is directly available inside the College of Arts & Sciences, and a presentation of plans for the creation of a Graduate certificate.

Abstract: Advances in computing, communication, and data collection have facilitated collection of petabyte-scale datasets from which data-driven models can be built. This digital transformation affects society, industry, and academia, since data-driven models can challenge how things are done and offer new opportunities for developing how things work.

At CWRU we have offered the university-wide Applied Data Science (ADS) program since 2015. The ADS program teaches non-computer science students, producing “T-shaped” graduates with deep knowledge in their domain plus strong data science skills. The ADS program provides both an undergraduate minor and graduate level courses for which a University Certificate is being developed. ADS students learn the foundations: coding, inferential statistics, exploratory data analysis, modeling and prediction, and they complete a semester long data science project for their ADS portfolio. The courses are taught using a practicum approach, with an open data science toolchain consisting of R, Python, Git, Markdown, Machine Learning, and TensorFlow on GPUs.

We utilize data science and big-data analytics to address critical problems in energy science. As solar power grows, we need to fully understand and predict the power output of photovoltaic (PV) modules over their entire > 30 year lifetimes. Degradation science [reference 1] combines data-driven statistical and machine learning with physical and chemical science to examine degradation mechanisms in order to improve PV materials and reduce system failures. We use distributed and high performance computing, based on Hadoop2 and the NoSQL Hbase, to ingest, analyze, and model large volumes of time-series datasets from 3.4 GW of PV power plants [reference 2]. We have developed an automated image processing and deep learning pipeline applied to electroluminescent (EL) images of PV modules to identify degradation mechanisms and predict their associated power losses [reference 3]. Unbiased, data-driven analytics, now possible using data science methodologies, represents a new front in our research studies of critically important and complex systems.

1. R.H. French, et al., Degradation science: Mesoscopic evolution and temporal analytics of photovoltaic energy materials, Curr. Op.Sol. State & Matls. Sci. 19 (2015) 212–226.

2. Y. Hu, et al., A Nonrelational Data Warehouse for the Analysis of Field and Laboratory Data From Multiple Heterogeneous Photovoltaic Test Sites, IEEE Journal of Photovoltaics. 7 (2017) 230–236.

3. A. M. Karimi, et al., Automated Pipeline for Photovoltaic Module Electroluminescence Image Processing and Degradation Feature Classification, IEEE Journal of Photovoltaics. (2019) 1–12.

Thursday, 16 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Mark Turner and The Red Hen Team
Title: Big Data Science for Multimodal Communication—An Overview of the International Distributed Little Red Hen Lab.

View Recording

Abstract: The International Distributed Little Red Hen Lab™ is a global big data science laboratory and cooperative for research into multimodal communication. Red Hen’s main goal is theory of multimodal communication. See Overview of the Red Hen Vision and Program. Red Hen’s secondary goal is the development of computational, statistical, and technical tools for big data science on multimodal communication. See e.g. Red Hen Lab’s Google Summer of Code 2019 Ideas page and Projects page. Red Hen’s tertiary goal is pedagogy: see her Τέχνη Public Site—Red Hen Lab’s Learning Environment

To Be Scheduled

To Be Scheduled

Speaker: Kelly McMann is Professor of Political Science and Director of the International Studies Program at Case Western Reserve University.
Title: Varieties of Democracy (V-Dem):  Big Data in the Social Sciences

Abstract: Where a country’s political regime falls along the authoritarian-democratic spectrum has a significant impact on its citizens’ lives and its interactions with those outside its borders. Yet, research about political regimes has faced severe data limitations. To try to uncover and understand broad trends and relationships, scholars have had to rely on datasets with limited global and temporal coverage, few indicators, and questionable validity. In response to these limitations a group of scholars, including the speaker, created Varieties of Democracy (V-Dem). V-Dem is a dataset of more than 450 indicators of political regimes in all countries of the world from 1789 to the present using a transparent, rigorous methodology. The V-Dem dataset has been available for free on the internet since 2016 and is updated annually. The dataset is being used by the World Bank, the United Nations General Assembly, the U.S. Agency for International Development, among other organizations, and scholars around the world. The dataset has been downloaded more than 100,000 times in more than 150 countries since its first release. The speaker, a project manager for V-Dem, will describe the obstacles to developing big data in the social sciences, the challenges and solutions to creating the V-Dem dataset, and the utility of the dataset outside of the social sciences.

To Be Scheduled

Speaker: Shannon French is the Inamori Professor in Ethics, Director of the Inamori International Center for Ethics and Excellence, and a tenured member of the Philosophy Department with a secondary appointment in the law school at Case Western Reserve University.
Moderator: Tim Beal