Title: "Representing Knowledge through Word and Graph Embeddings"
Libraries are about representing large quantities of knowledge to make them broadly useful and available. Similarly, word and graph embeddings (e.g. word2vec) provide powerful ways to reduce large text corpora to concise features readily applicable to a variety of problems in NLP and data science. I will introduce word embeddings, and apply them in variety of new and interesting directions, including:
(1) Multilingual NLP -- The Polyglot project (www.polyglot-NLP.com) employs deep learning and other techniques to build a basic NLP pipeline (including entity recognition, POS tagging, and sentiment analysis) for over 100 different languages. We train our systems over each language's Wikipedia edition, providing unified data resources in the absence of explicitly annotated data, but substantial challenges in interpretation and evaluation.
(2) Detecting Historical Shifts in Word Meaning -- Words like "gay" and "mouse" have substantially shifted their meanings over time in response to societal and technological changes. We use word embeddings trained over texts drawn from different time periods to detect changes in word meanings. This is part of our efforts in historical trends analysis.
(3) Feature Extraction from Graphs -- We present DeepWalk, our approach for learning latent representations of vertices in a network, which has become extremely popular. DeepWalk uses local information on truncated random walks to learn embeddings, by treating walks as the equivalent of sentences in a language. It is suitable for a broad class of applications such as network classification and anomaly detection. We also introduce new graph embedding techniques based on random projections, which produce DeepWalk-quality embeddings thousands of times faster than previous algorithms.
Biosketch:
Dr. Steven Skiena is Distinguished Teaching Professor of Computer Science and Director of the Institute for AI-Driven Discovery and Innovation at Stony Brook University. His research interests include data science, bioinformatics, and algorithms. He is the author of six books, including "The Algorithm Design Manual", "The Data Science Design Manual", and "Who's Bigger: Where Historical Figures Really Rank".
Dr. Skiena received his Ph.D. in Computer Science from the University of Illinois in 1988. He is the author of over 150 technical papers. He is a Fellow of the American Association for the Advancement of Science (AAAS), a former Fulbright scholar, and recipient of the ONR Young Investigator Award and the IEEE Computer Science and Engineer Teaching Award. More info is available at http://www.cs.stonybrook.edu/~skiena/.
Guest Speaker: Dr. Steven Skiena, Distinguished Teaching Professor of Computer Science and Director of the Institute for AI-Driven Discovery and Innovation at Stony Brook University
Title: "Representing Knowledge through Word and Graph embeddings"
Date: Tuesday, February 11, 2020
Time: 1pm-2pm
Location: Special Collections Seminar Room, E-2340, second floor of the Melville Library
Please register here.
Library Administration: 631.632.7100
Except where otherwise noted, this work by SBU Libraries is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.