You are now leaving the website that is under the control and management of DARPA. The appearance of hyperlinks does not constitute endorsement by DARPA of non-U.S. Government sites or the information, products, or services contained therein. Although DARPA may or may not use these sites as additional distribution channels for Department of Defense information, it does not exercise editorial control over all of the information that you may find at these locations. Such links are provided consistent with the stated purpose of this website.

After reading this message, click to continue immediately.

Go Back

/ Information Innovation Office (I2O)

Deep Exploration and Filtering of Text (DEFT)

Automated, deep natural-language processing (NLP) technology may hold a solution for more efficiently processing text information and enabling understanding connections in text that might not be readily apparent to humans. DARPA created the Deep Exploration and Filtering of Text (DEFT) program to harness the power of NLP. Sophisticated artificial intelligence of this nature has the potential to enable defense analysts to efficiently investigate orders of magnitude more documents, which would enable discovery of implicitly expressed, actionable information within those documents.

Program Manager: Mr. Boyan Onyshkevych


The content below has been generated by organizations that are partially funded by DARPA; the views and conclusions contained therein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.

Report a problem:

Last updated: November 13, 2015

University of Washington (publications) ReVerb Relation Extractor NLP Automatically identifies and extracts binary relationships from English sentences. Relations are lexical, i.e. they consist of a predicate (verb phrase) and its two noun phrase arguments. (Java/Scala) ALv2, CPL, In House
University of Washington (publications) OLLIE Relation Extractor NLP Extracts relations among sentences. Relations are lexical, i.e. the consist of a predicate and arguments as in 1.1 and also include lexical information about attribution and the enabling condition. (Java/Scala) ALv2, MIT, BSDv2,, In house
University of Massachusetts Universal Schema Relation Extraction NLP Extracts relations between entities in a large document collection. Relations are drawn from an ontology (e.g. attendedSchool). (Scala) ASL
University of Massachusetts Coreference within a Document NLP Identifies coreferences (group of equivalent items) within a document. Coreference is currently performed only for selected Entity types. (Scala) ASL
University of Massachusetts Cross Document Coreference with Resolution to Knowledge Base(KB) NLP Identifies coreferences (group of equivalent items) within a document and resolves the coreferences with a KB. Coreference is currently performed for People and Organizations. (Scala) ASL
University of Texas - Austin (publications) Bayesian Logic Programs for Textual Inference NLP Infers new relations based on given set of relations and documents. Relations are drawn from an ontology (e.g. attendedSchool). (Java) MIT
QCCUNY Labeled N-Ary Relations between Typed Text Spans NLP Extracts (binary) relations from the document. Relations are drawn from an ontology (e.g. attendedSchool). (Java) GPL, LGPL, BSD, ASL, MIT, CPL, SUN
QCCUNY Speech Segmentation (Prosodic Analysis for Anomaly Analysis) NLP Assigns segmentation at the level of sentences, prosodic phrases or discourse units as an ordered list of start time-end time pairs. (Java) GPLv3
QCCUNY Anomaly Analysis (Prosodic Analysis for Anomaly Analysis) NLP Annotates speech segment with confidence values denoting how uncertain the speaker is or how novel the information contained in the unit is. ASL
Columbia University (publications) Modality-CB Tagger NLP Tags the modality of predicates (verbs) into one of six classes (ability, effort, intention, success, want, belief) and for those of the speaker/author of the text toward each proposition in the text. (Java) InHouse, ASL, CPL, EPL, LGPL, GPL, BSD, MIT
Columbia University (publications) Sarcasm Detection NLP Judges whether a sentence is sarcastic or not. (Java) Modified BSD license (LIBSVM)
Columbia University (publications) Interaction: Acoustic Analysis NLP Generates a list of acoustic/prosodic characteristics from the input. (Java/C++) GPL
Columbia University (publications) Opinion NLP Tags phrases in sentences for subjectivity and polarity. (Java) GPL, ASL, BSD, In house
Columbia University (publications) Social Event Extraction NLP Labels interaction events and their participants. There are at least two types: Interaction and Cognition. (Java/C++) ALv2, In house
Oregon State University (publications) Cross Document Co-reference of Relations and Entities NLP Extracts relations, entities, and co-reference information across a set of documents. Relations are drawn from an ontology (e.g. attendedSchool). The initial focus will be on people and organizations. (Java) GPL
CMUCS (publications) CMUCS-coref-hovy-mitamura NLP Performs coreference of events (verbs and nominalizations). Coreference is typed (i.e. identity, part-of, member-of). Limited set of epistemic values attached to each event. (Java) CMUCS
CMUCS (publications) Novelty Detection NLP Assigns either a label or a numerical score of novelty in the input stream. CMUCS
CMUCS (publications) KB Web, Knowledge Resolver/Event Ontology Creation.htm, Event Ontology Creation NLP Generates an ontology of possible events. Event is a list of entities, each associated with statistical distribution over associated verbs and subject/object positions relative to verb. Events also include a description of pairwise relationships between entities (e.g., coreferent or other). Per communication with the PI, this algorithm will be delayed. CMUCS
CMUCS (publications) KB Web, Knowledge Resolver/Knowledge Resolver.htm, Knowledge Resolver NLP Processes a document and a set of hypotheses to generate the most coherent subset of hypotheses to explain the document. (Java) CMUCS
CMUML KB Web, Knowledge Resolver/Knowledge Base Web Service.htm, Knowledge Base(KB) Web Service NLP Retrieves semantic information about entities, slots, and their values. Also enables inference over the KB to determine confidence, justification, metadata, and relevant text for a given entity-slot pair. (Java) MIT
Johns Hopkins University (publications) PPDB (Paraphrase Database) NLP Generates a ranked list of potential paraphrases for the input. (Java) BSD
Johns Hopkins University (publications) DAPPER (Domain Adapted Paraphraser) NLP Generates a database of domain-specific paraphrases using the input data. BSD
Johns Hopkins University (publications) NattyLo - Recognizing Textual Entailment (RTE) System NLP Identifies entailment relations (entails, contradicts, unrelated, etc.) between pairs of passages. BSD
UIUC (publications) Recognizing Textual Entailment NLP Identifies entailment relations (entails, contradicts, unrelated, etc.) between pairs of passages or structured hypothesis and passage. (Java) LLVM
UIUC (publications) Extended Semantic Role Labeling NLP Exctracts n-ary relations within the document. Relation set is closer semantic/linguistic relations (e.g. BENEFICIARY, Arg-0). LLVM
UIUC (publications) Discourse (Within Document complements Coreference) NLP Recognizes relations between events, where events are ESRL predicate/argument structures. Relations include CAUSE/ENABLE, PART-OF. (Java) LLVM
UIUC (publications) Wikifier NLP Identifies a coherent set of concepts of interest in a given document, based on entities/concepts in a reference collection(Wikipedia, for now). Links entities in a document to Wikipedia URLs. (Java) LLVM
UIUC (publications) POS Tagger NLP Assigns part of speech tags to words in the input document. LLVM
UIUC (publications) Chunker NLP Divides the input documents into shallow parse chunks. LLVM
UIUC (publications) Named Entity Recognizer NLP Identifies named entities in the input document. GPLv2
UIUC (publications) Coreference (within document) NLP Identifies coreferences in a document for both entities and verbs. (Java) LLVM
UIUC (publications) Profiler NLP Identifies entity mentions in a collection of documents, and generates relevant relational statistics. The relations statisics are in terms of predicate argument structure. (Java) LLVM
Cornell University (publications) Attitude Extraction NLP Identifies the attitude (positive/negative) of an entity toward other entities within a sentence. (Java, Perl) GPL, Cornell/UNT/Pitt open-source license
Cornell University (publications) Textual Similarity NLP Assigns scores between [0,1] indicating the similarity in meaning between two sentences. (Java, Perl) GPL, BSD, Cornell/UNT/Pitt open-source license
Stanford University (publications) Lexical Entailment Resource - get Relation NLP Identifies entailment relation (entails, contradicts, etc.) between a given pair of phrases. GPLv2
Stanford University (publications) Stanford's Coreference (Within Document) NLP Identifies entity and verb coreferences within the input document. (Java) GPLv2
Stanford University (publications) Semantic Textual Similarity NLP Calculates the semantic similarity between a pair of input text. GPLv2
Stanford University (publications) Stanford's Labeled N-Ary Relations between Typed Text Spans (v1- Relation Extraction for KBP-Slot Filling) NLP Extracts binary relations from a document. Relations are pulled from an ontology (e.g. attended school). (Java) GPLv2
Stanford University (publications) Stanford's Coreference (Cross document with resolution to KB) NLP Identifies entity coreferences across documents, and updates the KB with new found entities. System was designed for people and organizations. GPLv2
Stanford University (publications) Stanford's Mention Detection (Within Document) NLP Identifies and labels entity mentions within a document. (Java) GPLv2
Stanford University (publications) Stanford Named Entity Recognizer NLP Identifies named entites in text. Can be used with various models. As packaged by default with Stanford CoreNLP, 12 classes are recognized: DATE, TIME, DURATION, SET, NUMBER, ORDINAL, PERSON, LOCATION, ORGANIZATION, MONEY, PERCENT, MISC. (Java) GPLv2
BBN (publications) ADEPT v1.6 Interface, API Interfaces and integration framework for DEFT HLT algorithms and components. (Java) UGPR
BBN (publications), Next Century Corporation ADEPT OWF-based Demo v1 Visualization User interface and visulization. (Java) UGPR
(WiscTeam) Indiana University Knowledge Intensive Learning: Combining Qualitative Constraints with Causal Independence for Parameter Learning in Probabilistic Models
(WiscTeam) University of Wisconsin - Madison, Indiana University Guiding Autonomous Agents to Better Behaviors through Human Advice
(WiscTeam) Wake Forest University, Indiana University, University of Wisconsin - Madison, University of Alberta, Canada AR-Boost: Reducing Overfitting by a Robust Data-Driven Regularization Strategy
(WiscTeam) Fraunhofer IAIS, TU Dortmund, Indiana University Lifted Online Training of Relational Models with Stochastic Gradient Methods
(WiscTeam) Fraunhofer IAIS, TU Dortmund, Indiana University Lifted Parameter Learning in Relational Models
(WiscTeam) Indiana University, Cycorp, University of Wisconsin - Madison, TU Dortmund, Oregon State University Accelerating Imitation Learning in Relational Domains via Transfer by Initialization
(WiscTeam) Indiana University, Wake Forest University, University of Wisconsin - Madison, Stanford University, TU Dortmund Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text
(WiscTeam) University of Wisconsin - Madison, Indiana University, TU Dortmund Learning Relational Probabilistic Models from Partially Observed Data - Opening the Closed-World Assumption
(WiscTeam) University of Wisconsin - Madison, Indiana University, TU Dortmund Structure Learning with Hidden Data in Relational Domains
(WiscTeam) University of Wisconsin - Madison, Stanford University Understanding Tables in Context Using Standard NLP Toolkits
(WiscTeam) University of Wisconsin - Madison, Stanford University Towards High-Throughput Gibbs Sampling at Scale: A Study across Storage Managers
(WiscTeam) University of Wisconsin - Madison, Stanford University Scaling Inference for Markov Logic via Dual Decomposition
(WiscTeam) University of Wisconsin - Madison, Stanford University Parallel Stochastic Gradient Algorithms for Large-Scale Matrix Completion
(WiscTeam) University of Wisconsin - Madison, Stanford University Hazy: Making it Easier to Build and Maintain Big-data Analytics
(WiscTeam) University of Wisconsin - Madison, Stanford University Big Data versus the Crowd: Looking for Relationships in All the Right Places
(WiscTeam) University of Wisconsin - Madison, Stanford University Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference
(WiscTeam) University of Wisconsin - Madison, Stanford University DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference
(WiscTeam) University of Wisconsin - Madison, Stanford University Optimizing Statistical Information Extraction Programs Over Evolving Text
(WiscTeam) University of Wisconsin - Madison, Stanford University Toward a Noncommutative Arithmetic-Geometric Mean Inequality: Conjectures, Case-studies, and Consequences
(WiscTeam) University of Wisconsin - Madison, Stanford University Factoring Nonnegative Matrices with Linear Programs
(WiscTeam) University of Wisconsin - Madison, Stanford University, Others Building an Entity-Centric Stream Filtering Test Collection for TREC 2012
(WiscTeam) University of Wisconsin - Madison, Stanford University, University of Washington Understanding Cardinality Estimation using Entropy Maximization
(WiscTeam) University of Wisconsin - Madison, University of California - Berkeley, Indiana University Learning Relational Structure for Temporal Relation Extraction
(WiscTeam) Wake Forest University, SRI, Indiana University Initial Empirical Evaluation of Anytime Lifted Belief Propagation
Bar Ilan University, CUNY Efficient Implementation of Beam-Search Incremental Parsers
CMUCS A Structured Distributional Semantic Model for Event Co-reference
CMUCS A Structured Distributional Semantic Model: Integrating Structure with Semantics
CMUCS Events are Not Simple: Identity, Non-Identity, and Quasi-Identity
CMUCS Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic
CMUCS Story-Level Inference and Gap Filling to Improve Machine Reading
Columbia Dept of CS, Columbia CCLS Written Dialog and Social Power: Manifestations of Different Types of Power in Dialog Behavior
Columbia Dept of CS, Columbia CCLS Automatic Extraction of Social Networks from Literary Text: A Case Study on Alice In Wonderland
Columbia Dept of CS, Columbia CCLS SINNET: Social Network Extractor from Text
Columbia University Columbia NLP: Sentiment Detection of Subjective Phrases in Social Media
Columbia University, Rensselaer Polytechnic institute, George Washington University Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
Columbia University, University of the Basque Country, George Washington University *SEM 2013 shared task: Semantic Textual Similarity
CUNY Minibatch and Parallelization for Online Large Margin Structured Learning
CUNY Optimal Incremental Parsing via Best-First Dynamic Programming
CUNY Detecting Laughter and Filled Pauses Using Syllable-based Features
CUNY Sure, I Did The Right Thing: A System for Sarcasm Detection in Speech
CUNY Let Me Finish: Automatic Conflict Detection Using Speaker Overlap
CUNY, Google Online Learning for Inexact Hypergraph Search
ICT, CUNY, IBM Max-Violation Perceptron and Forced Decoding for Scalable MT Training
IHMC Topical Positioning: A New Method for Predicting Opinion Changes in Conversation.
IHMC CASTLE: Crowd-Assisted System for Textual Labeling & Extraction
LDC Annotation Trees: LDC's Customizable, Extensible, Scalable Annotation Infrastructure
Oregon State University Output Space Search for Structured Prediction
Oregon State University HC-Search: Learning Heuristics and Cost Functions for Structured Prediction
RPI Detecting Community Evolution and Leadership Roles during Emergencies
RPI RPI-BLENDER TAC-KBP2013 Knowledge Base Population System Description
RPI Evolution of Communities on Twitter and the Role of their Leaders during Emergencies
RPI Tackling Representation, Annotation and Classification Challenges for Temporal Knowledge Base Population
RPI, CUNY Joint Event Extraction via Structured Prediction with Global Features
RPI, Google Expanding Microblog Context to Enhance Disambiguation to Wikipedia
RPI, IBM, UIUC, NEU Resolving Entity Morphs in Censored Data
RPI, ISI Curating and Contextualizing Twitter Stories to Assist with Social Newsgathering
RPI, UIUC Constructing Topical Hierarchies in Heterogeneous Information Networks
RPI, UIUC EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
RPI, CUNY CUNY_BLENDER TAC-KBP2012 Entity Linking System and Slot Filling Validation System
RPI, HIT Combining Social Cognitive Theories with Linguistic Features for Multi-genre Sentiment Analysis
RPI, SRI Name-aware Machine Translation
RPI, SRI, IBM Joint Bilingual Name Tagging for Parallel Corpora
RPI, UIUC Exploring and Inferring User-User Pseudo-Friendship for Sentiment Analysis with Heterogeneous Networks
RPI, UIUC, BBN, ARL Tweet Ranking based on Heterogeneous Networks
Stanford University Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Stanford University Semantic Parsing on Freebase from Question-Answer Pairs
Stanford University These Guys Are Teaching Computers How to Think Like People
Stanford University Stanford researchers to open-source model they say has nailed sentiment analysis
Stanford University Generating Recommendation Dialogs by Extracting Information from User Reviews
Stanford University Parsing With Compositional Vector Grammars
Stanford University Better Word Representations with Recursive Neural Networks for Morphology
Stanford University Philosophers are Mortal: Inferring the Truth of Unseen Facts
Stanford University The Life and Death of Discourse Entities: Identifying Singleton Mentions
Stanford University Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions
Stanford University Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors
Stanford University Convolutional-Recursive Deep Learning for 3D Object Classification
Stanford University Joint Entity and Event Coreference Resolution across Documents
Stanford University Learning Constraints for Consistent Timeline Extraction
Stanford University Semantic Compositionality Through Recursive Matrix-Vector Spaces
Stanford University Multi-instance Multi-label Learning for Relation Extraction
Stanford University Probabilistic Finite State Machines for Regression-based MT Evaluation
Stanford University Improving Word Representations via Global Context and Multiple Word Prototypes
Stanford University Stanford: Probabilistic Edit Distance Metrics for STS
Stanford University SPEDE: Probabilistic Edit Distance Metrics for MT Evaluation
Stanford University Parsing Time: Learning to Interpret Time Expressions
Stanford University SUTIME: A Library for Recognizing and Normalizing Time Expressions
Stanford University, Google Research Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
Stanford University, Google Research Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models
Stanford University, Google Research Three Dependency-and-Boundary Models for Grammar Induction
Stanford University, Google Research Capitalization Cues Improve Dependency Grammar Induction
Stanford University, Harbin Institute of Technology Effective Bilingual Constraints for Semi-supervised Learning of Named Entity Recognizers
Stanford University, Harbin Institute of Technology Named Entity Recognition with Bilingual Constraints
Stanford University, Harbin Institute of Technology A Comparison of Chinese Parsers for Stanford Dependencies
Stanford University, Harbin Institute of Technology Stanford's System for Parsing the English Web
University of Texas - Dallas, University of Washington Structured Message Passing
UIUC Recognizing Textual Entailment
UIUC An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
UIUC Structured Learning with Constrained Conditional Models
UIUC Unified Expectation Maximization
UIUC Illinois-Coref: The UI System in the CoNLL-2012 Shared Task
UIUC On Amortizing Inference Cost for Structured Prediction
UIUC Joint Inference for Event Timeline Construction
UIUC Efficient Decomposed Learning for Structured Prediction
UIUC A Framework for Tuning Posterior Entropy in Unsupervised Learning
UIUC Efficient Pattern-Based Time Series Classification on GPU
UIUC Modeling Semantic Relations Expressed by Prepositions
UIUC The University of Illinois System in the CoNLL-2013 Shared Task
UIUC Margin-based Decomposed Amortized Inference
UIUC Multi-core Structural SVM Training
UIUC Relational Inference for Wikification
UIUC A Constrained Latent Variable Model for Coreference Resolution
UIUC Illinois Cognitive Computation Group UI-CCG TAC 2013 Entity Linking and Slot Filler Validation Systems
UIUC ILLINOISNLPCLOUD: Text Analytics Services in the Cloud
UIUC, Bar Ilan University, University of Rome Tor Vergata Recognizing Textual Entailment: Models and Applications
UIUC, LDC Annotating Textual Inference
University of Massachusetts Amherst Latent Relation Representations for Universal Schemas
University of Massachusetts Amherst Dynamic Knowledge-Base Alignment for Coreference Resolution
University of Massachusetts Amherst Unsupervised Relation Discovery with Sense Disambiguation
University of Massachusetts Amherst A Discriminative Hierarchical Model for Fast Coreference at Large Scale
University of Massachusetts Amherst Monte Carlo MCMC: Efficient Inference by Approximate Sampling
University of Massachusetts Amherst Monte Carlo MCMC: Efficient Inference by Sampling Factors
University of Massachusetts Amherst Probabilistic Databases of Universal Schema
University of Massachusetts Amherst Learning to speed up MAP decoding with column generation
University of Massachusetts Amherst Improving NLP through Marginalization of Hidden Syntactic Structure
University of Massachusetts Amherst MAP Inference in Chains using Column Generation
University of Massachusetts Amherst Across-Document Neighborhood Expansion: UMass at TAC KBP 2012 Entity Linking
University of Massachusetts Amherst Relation Extraction with Matrix Factorization and Universal Schemas
University of Texas - Dallas Fast Joint Compression and Summarization via Graph Cuts
University of Texas - Dallas Document Summarization via Guided Sentence Compression
University of Texas - Dallas Using Denoising Autoencoder for Emotion Recognition
University of Texas - Dallas A Preliminary Study of Cross-lingual Emotion Recognition from Speech: Automatic Classification versus Human Perception
University of Texas - Dallas Using Supervised Bigram-based ILP for Extractive Summarization
University of Texas - Dallas Disfluency Detection Using Multi-step Stacked Learning
University of Washington A sequential repetition model for improved disfluency detection
University of Washington Discriminative Learning of Sum-Product Networks
University of Washington Learning the Structure of Sum-Product Networks
University of Washington Tractable Probabilistic Knowledge Bases with Existence Uncertainty
University of Washington Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
University of Washington Joint Coreference Resolution and Named-Entity Linking with Multi-pass Sieves
University of Washington Extracting Meronyms for a Biology Knowledge Base using Distant Supervision
University of Washington Fine-Grained Entity Recognition
University of Washington Learning Distributions over Logical Forms for Referring Expression Generation
University of Washington Scaling Semantic Parsers with On-the-fly Ontology Matching
University of Washington Modeling Missing Data in Distant Supervision for Information Extraction
University of Washington Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions
University of Washington Paraphrase-Driven Learning for Open Question Answering
University of Washington Generating Coherent Event Schemas at Scale
University of Washington Towards Coherent Multi-Document Summarization
University of Washington Entity Linking at Web Scale
University of Washington Rel-grams: A Probabilistic Model of Relations in Text
University of Washington No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
University of Washington Open Language Learning for Information Extraction
USC/Information Sciences Institute, LDC, Colorado, CMUCS Abstract Meaning Representation for Sembanking
University of Texas - Austin Learning to 'Read Between the Lines' Using Baysian Logic
University of Texas - Austin Towards a semantics for distributional representations
University of Texas - Austin Intentionality was only alleged: On adjective-noun composition in distributional semantics
University of Texas - Austin Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
University of Texas - Austin Online Inference-Rule Learning from Natural-Language Extractions
University of Texas - Austin A Formal Approach to Linking Logical Form and Vector-Space Lexical Semantics
University of Pittsburgh Benefactive/Malefactive Event and Writer Attitude Annotation
University of North Texas, Cornell University, University of Pittsburgh, Google, LCC CPN-CORE: A Text Semantic Similarity System Infused with Opinion Knowledge
University of North Texas Sentiment Analysis of Online Spoken Reviews
University of North Texas, USC/ICT Utterance-Level Multimodal Sentiment Analysis
Cornell University Negative Deceptive Opinion Spam
Cornell University TopicSpam: a Topic-Model-Based Approach for Spam Detection
Cornell University Joint Inference for Fine-grained Opinion Extraction
Cornell University Identifying Manipulated Offerings on Review Portals
Johns Hopkins University, University of Pennsylvania PARMA: A Predicate Argument Aligner
Johns Hopkins University, University of Pennsylvania PPDB: The Paraphrase Database
Johns Hopkins University Evaluating Progress in Probabilistic Programming through Topic Models