Information Extract Summarization
论文基本信息
-
论文名:Natural Language Processing for Information Extraction
Natural Language Processing for Information Extraction
Information Extraction Tasks
-
Parts-of-Speech (POS) tagging
marking words with Parts-of-Speech labels such as noun, verb, adjective, preposition
-
Parsing
Constituency parsers
Dependency Parsing
-
Named Entity Recognition (NER)
The task is to find Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE)
2015, NER using Word2Vec features
2016, CharNER(Character based named entity recognition),cross-lingual NER, LSTM-CRF
-
Named Entity Linking (NEL), Named Entity Disambiguation (NED), Named Entity Normalization (NEN)
determining the identity of entities
2015, Personalized Page Rank
2016, language independent entity linking (LIEL) system(trained on one language works for number of different languages.)
-
Coreference Resolution (CR)
determining which noun phrases (including pronouns, proper names and common names) refer to the same entities in documents
open-source platforms: BART, Illinois Coreference Package, Stanford CoreNLP
joint models
CR- NER (Haghighi and Klein, 2010; Singh et al., 2013)
CR-NEL (Hajishirzi et al., 2013)
NER-CR-NEL (Durrett and Klein, 2014)
CCR-NEL (Dutta et al., 2015).
-
Temporal Information Extraction (Event Extraction)
identifying information which can be ordered in a temporal order in text
-
Relation Extraction (RE)
detecting and classifying pre-defined relationships between entities identified
2016, Seq2Seq(reduce the need of annotated data), attention based CNN
-
Knowledge Base Reasoning and Completion, link prediction
determining the relationship between entities Knowledge Graph
2015, Relational machine learning
(1) statistical relational learning. (2) path ranking methods. (3) embedding-based models
Information Extraction Tools
-
Public IE tools:
GATE(General Architecture for Text Engineering), OpenNLP (Apache OpenNLP-Java machine learning toolkit for NLP), Stanford NER, GExp, Mallet (Machine learning for language toolkit), Natural Language Toolkit (Suite of Python libraries for NLP).
-
Commercial IE tools:
Altensity, Open Calais, ClaraBridge, SAS Text Analytics, Business Objects, IBM Intelligent Minerand, Lingpipe.
-
Specialized IE tools:
Ariadne Genomics Medscan Reader for biomedical documents, RINX for resumes.
Current challenges and future research
- Open Information Extraction (OpenIE)
- BioIE
- Business Analytics
- Text IE in Images and Videos
- Web Harvesting
Comments