The larger aim of the project is to create a chronological ordering of important sub-events (and their participants) involved in a bigger event. I worked on the task of creating a strong baseline model for identifying important events in news articles and their associated participants. We used articles from the ECB+ Corpus and Timebank Corpus. We performed extensive feature engineering using syntactic and semantic features that were then used with a CRF Classifier to achieve the following F-1 scores:
Event Extraction
- ECB+ Corpus - 73.02 %
- TimeBank Corpus - 80.78%
- ECB+ Corpus - 73.02 %
Participants
- ECB+ Corpus - 56.51%