A project of the Data Science Across the College initiative, directed by Timothy Beal and Mark Turner and supported by President Barbara Snyder, to build on strengths and potentials across many fields, including the arts and humanities, social and behavioral sciences, and natural and mathematical sciences, to elevate data science and machine learning as a college-wide focus for research, scholarship, and curriculum development.

The Fall 2020 colloquium is devoted to presentations affiliated with h.lab.


Tuesday, Date to be determined, 4-5 pm.

A zoom webinar.  Ask turner@case.edu for an invitation.

Speaker: Tim Beal, Florence Harkness Professor of Religion.
Title: TBA

Abstract: TBA


Previous colloquia

Thursday, 16 April 2020, 4-5 pm

A zoom webinar.  Ask turner@case.edu for an invitation.

Speaker: Erkki Somersalo
Professor, Department of Mathematics, Applied Mathematics, and Statistics.
Case Western Reserve University.
Title: Self-organizing map and imaging: New applications for an old method

View Recording

Abstract:
Self-organizing map (SOM) is a classic example of early artificial neural network algorithms for dimension reduction and organizing, visualizing and analyzing data. The algorithm was proposed and developed by Teuvo Kohonen in the 1980s, and is sometimes referred to as Kohonen map. The algorithm is heuristic, drawing ideas from organization of neurons to perform special tasks, and, e.g., Hebbian learning models. It also constitutes a basis for certain classification algorithms based of feature vectors such as learning vector quantifier (LVQ). In this talk, the basic idea of SOM is explained, with the emphasis on the intuitive side of the approach related to the geometric vs. topological organization of data, and the algorithm is then applied to certain high dimensional imaging data, in particular, the hyperspectral imaging in remote sensing as well as texture analysis.

Thursday, 9 April 2020, 4-5 pm

A zoom webinar.  Ask turner@case.edu for an invitation.

Speakers: Members of the AI Institute at Iliff School of Theology
Title: ai.iliff – conversational ai for online learning

View Recording

Abstract: ai.iliff is a Henry Luce Foundation funded AI institute housed within the Iliff School of Theology in Denver, CO. ai.iliff developed out of several years of experience learning with machines as partners in the process of scholarship and research in the humanities. Building upon the recent advances in pre-trained language models for NLP tasks, we are using our TRUST model for ai design to build conversational  ai applications to enhance student learning in online education.

Thursday, 19 March 2020, 4-5 pm
A zoom webinar.  Ask turner@case.edu for an invitation.

Speaker: Peter Whitehouse Peter J. Whitehouse MD-PhD has a primary appointment as Professor of Neurology, with secondary positions as Professor of Psychiatry, Cognitive Science, Neuroscience, and Organizational Behavior, and former appointments (but current interests) in Psychology, Bioethics, History, and Nursing at Case Western Reserve University. He is also currently Professor of Medicine at the University of Toronto, Honorary Research Fellow (zoology and aging) University of Oxford, and Founding President of Intergenerational Schools International. He is a card-carrying transdisciplinarian of the French variety. Major focuses of his current work have been age-associated cognitive challenges (formerly Alzheimer’s disease) and the nature of evidence and evidence of nature.
Title: If Big Data is the answer to Alzheimer’s, what is the question?

View Recording

Abstract: “Alzheimer’s” is for many a dominant individual and social concern. How do we gather and analyze evidence wisely to understand the phenomenology of aging associated cognitive challenges and help people and communities suffering from the condition? Examined transdisciplinarily what is the nature of evidence? Almost hypothesis-less Big Data is said to be the answer to “curing” Alzheimer’s. But as always, framing the questions and examining the words in them are the best places to start finding helpful answers. How does Alzheimer’s-type dementia relate to aging? Is it one condition? Is preserving brain health more about molecules and genes or communities and politics? What is the evidence for evidence and whose version anyway? What is the story of Big Data in the biopolitics of dementia? How do we create not AI but new symbionic/symbiotic intelligences? What is the bigger story, even grander narrative, we need to tell about the brain and aging in the emergent Anthropocene? Deconstructing the narrative of Alzheimer’s can be part of understanding our collective great derangement in failing to address the collapse of ecosystems and hence also potentially modern civilization? More importantly, how to do we use data, narrative, and metaphor together to create not only knowledge but wisdom to reinvent ourselves and our societies to be more resilient and sustainable?

Thursday, 5 March 2020, 4-5 pm
A13 Crawford Hall

Speaker: Jing Li, Leonard Case Jr. Professor in Engineering and Interim Chair Department of Computer and Data Sciences
Moderator: Tim Beal
Title: Heterogeneous Network Analysis for Computational Drug Prediction

View Recording

Abstract: Heterogeneous networks have been widely used in modeling real-world complex systems and have been a powerful tool in studying complex biological problems. Link prediction in heterogeneous networks is one of the key computational problems. Efficient and effective algorithms for link prediction in heterogeneous networks are in great need. Furthermore, large scale network based integrative analyses that use multiple data sources have been a promising strategy for many applications in computational biology such as computational drug prediction. A key challenge in integrating multiple data sources is the lack of an extendable system that can effectively handle missing data from multiple sources. In this talk, I will go over some of our recent work in computational drug predictions based on multiple data sources. Many of the problems can be casted as missing link prediction problems on heterogeneous networks. Various approaches including random walk with restart, joint matrix-matrix decomposition, and joint tensor-matrix decomposition will be discussed.

Thursday, 27 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Jennifer Hinnell is a Killam Laureate and PhD candidate in the Department of Linguistics at the University of Alberta under Dr. Sally Rice. Her research spans cognitive and corpus linguistics, multimodality, and gesture studies.
Title: Language in the body: Quantifying the multimodal signal in spontaneous discourse

View Recording

Abstract: Cognitive linguists have long acknowledged the role of embodiment and interaction on both the structure and meaning of language. However, until recently, movements of the body that accompany face-to-face interaction have not been included in the analysis of linguistic expressions. In my research, I investigate the role of the body as a critical part of linguistic meaning-making. I use the Red Hen archive, an international multimedia database of broadcast media, as well as motion capture data, to examine language use across a range of expressions in specific linguistic and conceptual domains. In addition to investigating the linguistic features involved in a particular utterance, I explore the usage patterns in the manual gestures, head movements, shoulder shrugs, postural shifts, eye-gaze, and brow movements. Using corpus annotation methodologies and quantitative and statistical analysis, I capture a range of linguistic and kinesic usage patterns that speakers produce with particular utterances. The research provides evidence for the coordinated and recurrent bodily enactment of grammatical and discourse-level expressions and addresses issues at the centre of multimodal research, such as the degree of convention and, by the same token, the degree of variation inherent in the kinesic signal. The research has implications for language documentation and description, especially of languages that rely predominantly on verbal and visual signals (e.g. signed and oral Indigenous languages). It also has applications in multimedia technologies, e.g. in virtual agents that rely on human-like language use and animated dialogue in films and video games.

Thursday, 20 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Weihong Guo.  Associate Professor, Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University
Title: High resolution image reconstruction and feature extraction

View Recording

Abstract: Images in two and higher dimensions are present in our daily life, ranging from pictures taken by smartphones to those obtained from medical devices. Recent developments in science and technology have caused a revolution in the generation, acquisition, analysis, processing, and visualization of images. Take medical imaging as an example. A variety of imaging modalities, e.g., computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), provide great potential to facilitate diagnosis and treatment. On the other hand, imaging problems, including reconstruction, enhancement, segmentation, and registration, are vital to many areas of science, medicine, and engineering. However, as the explosive growth of data, traditional methods in imaging face a lot of challenges, e.g., how to reconstruct high quality/resolution images from indirect raw machine measurements, how to effectively enhance image quality, and how to extract features of interest from images. Appropriate models and efficient computational algorithms play a crucial role in imaging performance. I will present some overview of related courses we offer in the Mathematics, Applied Mathematics and Statistics  department and my recent research results in these directions.

The results are based on collaboration with Yue Zhang (a former PhD student, now in Siemens Corporate Research),Professors Liang-Jian Deng (a former visiting PhD student, now in UESTC, China),  Jocelyn Chanussot (Grenoble Institute of Technology, Italy), Ke Chen (Liverpool, UK) and Liam Burrows (Liverpool, UK).


Thursday, 13 February 2020, 3:30-4:30 pm
Freedman Center, Kelvin Smith Library
Meet and Greet: Love Data Week
Meet for casual discussions, with refreshments. Members of the Library team for Data Science and the Freedman Center for Digital Scholarship will attend.
Registration required. Click to register!

Thursday, 6 February 2020, 4-5 pm
A13 Crawford Hall

Speaker: Daniela Calvetti, The James Wood Williamson Professor of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University
Title: Meditating Brain: Analysis and Interpretation of MEG data during meditation

View Recording

Abstract:
In this talk, we describe the ongoing research project to understand the fine scale spatio-temporal changes in the brain activity during meditation. The raw data consist of hours of Magnetoencephalography (MEG) recordings of professional meditators from the Theravada Buddhist tradition. The data was registered with one millisecond time resolution, the meditators alternating between focussed attention (Samatha), open monitoring, or mindfulness (Vipassana), and eyes closed resting state. The data is first processed to obtain a time series activity map of the brain, and subsequently interpreted by using data driven model reduction techniques. The work is part of a collaboration with researchers at CWRU (Daniela Calvetti, Erkki Somersalo, Brian Johnson (currently at Yelp) and University of Rome “La Sapienza”, Italy (Annalisa Pascarella, Francesca Pitolli, Barbara Vantaggi).

Thursday, 30 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Tim Beal, Florence Harkness Professor of Religion, with Justin Barber and Michael Hemenway, AI Institute at Iliff School of Theology
Title: Face of the Deep: Perverse Engagements with Neural Machine Translation

View Recording

Abstract: The broad aim of this project is to explore new possibilities for translation within a post-print digital media environment. Working with emerging technologies of neural machine translation (NMT) and natural language processing (NLP) in the programming language of Python, we are exploring new models and methods for translating Hebrew biblical and other ancient texts. Whereas print translation pushes the translator toward closure, deciding on a single translation and relegating alternatives to footnotes or parentheses, how might new media technologies make it possible to provide readers/users access to the processes of translation, hosting an encounter that attends to the rich ambiguities and polyvocalities of the other text in translation? How might we deploy new media technologies in ways that radically alter translation, not only transforming the processes of translation but also involving users in those processes?

Our interests as humanities scholars in ambiguity, polyvocality, and the irreducible otherness of the text in translation fly in the face of the burgeoning industry of NMT (e.g., Google Translate). Whereas the consumer-oriented goal of NMT is to erase ambiguity and make the processes of translation invisible and immediate (so users barely realize translation is taking place), we aim to build models using NMT and other NLP tools perversely, to slow down and make visible the complex processes of translation, in order to invite users to participate in those processes.

Thursday, 23 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Roger French, Kyocera Professor, Materials Science & Engineering; Director, SDLE Research Center; Faculty Director, Applied Data Science Program
Title: Data Science and Machine Learning Applied to Silicon Photovoltaic Solar Panels: Doing Energy Science at Scale with Time-series and Image Datasets

View Recording

This talk will begin with a review of the Undergraduate Minor in Applied Data Science, which is directly available inside the College of Arts & Sciences, and a presentation of plans for the creation of a Graduate certificate.

Abstract: Advances in computing, communication, and data collection have facilitated collection of petabyte-scale datasets from which data-driven models can be built. This digital transformation affects society, industry, and academia, since data-driven models can challenge how things are done and offer new opportunities for developing how things work.

At CWRU we have offered the university-wide Applied Data Science (ADS) program since 2015. The ADS program teaches non-computer science students, producing “T-shaped” graduates with deep knowledge in their domain plus strong data science skills. The ADS program provides both an undergraduate minor and graduate level courses for which a University Certificate is being developed. ADS students learn the foundations: coding, inferential statistics, exploratory data analysis, modeling and prediction, and they complete a semester long data science project for their ADS portfolio. The courses are taught using a practicum approach, with an open data science toolchain consisting of R, Python, Git, Markdown, Machine Learning, and TensorFlow on GPUs.

We utilize data science and big-data analytics to address critical problems in energy science. As solar power grows, we need to fully understand and predict the power output of photovoltaic (PV) modules over their entire > 30 year lifetimes. Degradation science [reference 1] combines data-driven statistical and machine learning with physical and chemical science to examine degradation mechanisms in order to improve PV materials and reduce system failures. We use distributed and high performance computing, based on Hadoop2 and the NoSQL Hbase, to ingest, analyze, and model large volumes of time-series datasets from 3.4 GW of PV power plants [reference 2]. We have developed an automated image processing and deep learning pipeline applied to electroluminescent (EL) images of PV modules to identify degradation mechanisms and predict their associated power losses [reference 3]. Unbiased, data-driven analytics, now possible using data science methodologies, represents a new front in our research studies of critically important and complex systems.

References
1. R.H. French, et al., Degradation science: Mesoscopic evolution and temporal analytics of photovoltaic energy materials, Curr. Op.Sol. State & Matls. Sci. 19 (2015) 212–226.

2. Y. Hu, et al., A Nonrelational Data Warehouse for the Analysis of Field and Laboratory Data From Multiple Heterogeneous Photovoltaic Test Sites, IEEE Journal of Photovoltaics. 7 (2017) 230–236.

3. A. M. Karimi, et al., Automated Pipeline for Photovoltaic Module Electroluminescence Image Processing and Degradation Feature Classification, IEEE Journal of Photovoltaics. (2019) 1–12.

Thursday, 16 January 2020, 4-5 pm
A13 Crawford Hall

Speaker: Mark Turner and The Red Hen Team
Title: Big Data Science for Multimodal Communication—An Overview of the International Distributed Little Red Hen Lab.

View Recording

Abstract: The International Distributed Little Red Hen Lab™ is a global big data science laboratory and cooperative for research into multimodal communication. Red Hen’s main goal is theory of multimodal communication. See Overview of the Red Hen Vision and Program. Red Hen’s secondary goal is the development of computational, statistical, and technical tools for big data science on multimodal communication. See e.g. Red Hen Lab’s Google Summer of Code 2019 Ideas page and Projects page. Red Hen’s tertiary goal is pedagogy: see her Τέχνη Public Site—Red Hen Lab’s Learning Environment



To Be Scheduled

To Be Scheduled

Speakers:
Ken Singer, Ambrose Swasey Professor of Physics, with
Michael Hinczewski, Assistant Professor of Physics
Ina Martin, Senior Research Associate (Physics), Adjunct Faculty in the Department of Materials Science and Engineering
Betsy Bolman, Elsie B. Smith Professor in the Liberal Arts and Chair, Department of Art History and Art
Title: Data Science in Art: Discerning the Painter’s Hand

Abstract: The Departments of Art History and Art, Physics, Materials Science and Engineering, the Cleveland Museum of Art and the Cleveland Institute of Art have been collaborating to investigate the application of machine learning (ML) to artist attribution based on confocal optical profilometry data from student-produced painting via the brushstroke texture. A convolutional neural network was applied to classify the surface topography among several students’ paintings. By specifying the painted subject and materials for the students, we were able to carry out a controlled study of various ML approaches including the efficacy of transfer learning as well as scaling, normalization and other pre-training analyses. We were able to confidently attribute paintings among multiple hands, with potentially significant implications for the art historical field of connoisseurship. To this end, we are now collaborating with the internationally-renown Factum Arte, in a project to apply ML on surface topography in order to test the ability of our techniques to distinguish among the hands of El Greco, his son Jorge, and members of his workshop. Additional avenues of collaboration include applying optical methods for producing non-destructive cross-sections of art works and other art conservation techniques.




To Be Scheduled

Speaker: Kelly McMann is Professor of Political Science and Director of the International Studies Program at Case Western Reserve University.
Title: Varieties of Democracy (V-Dem):  Big Data in the Social Sciences

Abstract: Where a country’s political regime falls along the authoritarian-democratic spectrum has a significant impact on its citizens’ lives and its interactions with those outside its borders. Yet, research about political regimes has faced severe data limitations. To try to uncover and understand broad trends and relationships, scholars have had to rely on datasets with limited global and temporal coverage, few indicators, and questionable validity. In response to these limitations a group of scholars, including the speaker, created Varieties of Democracy (V-Dem). V-Dem is a dataset of more than 450 indicators of political regimes in all countries of the world from 1789 to the present using a transparent, rigorous methodology. The V-Dem dataset has been available for free on the internet since 2016 and is updated annually. The dataset is being used by the World Bank, the United Nations General Assembly, the U.S. Agency for International Development, among other organizations, and scholars around the world. The dataset has been downloaded more than 100,000 times in more than 150 countries since its first release. The speaker, a project manager for V-Dem, will describe the obstacles to developing big data in the social sciences, the challenges and solutions to creating the V-Dem dataset, and the utility of the dataset outside of the social sciences.

To Be Scheduled

Speaker: Shannon French is the Inamori Professor in Ethics, Director of the Inamori International Center for Ethics and Excellence, and a tenured member of the Philosophy Department with a secondary appointment in the law school at Case Western Reserve University.
Moderator: Tim Beal
Title:

Abstract: