HighLights

MEAD

MEAD is an underlying natural language processing (NLP) summarizer. It is used by MiMI in its BioNLP searches. You can also download MEAD for other uses, such as building a summarizer from scratch or evaluating the impact of a new sentence feature on a summarization process: http:/www.summarization.com/mead.

In MiMI BioNLP searches, MEAD orders (by relevance) the summaries from the research literature pertaining to molecules or interactions relevant to the items in the query. If you choose to have MEAD sort summaries by relevance, your results will be more semantically cohesive. One caveat is that MEAD takes several minutes to semantically sort lists with more than 20 displayed summaries.

MEAD summary content and focus has proven comparable to summaries constructed by human experts, if experts apply the same sentence extraction processes used by MEAD. MEAD extracts sentences from a cluster of related documents. For each sentence, MEAD computes a centroid score, which is the centrality of a sentence in the overall topic of the cluster. MEAD also computes a position score, scores that are higher for sentences closer to the beginning of a document, and a score based on overlap with the first sentence of the title. MEAD then summarizes multiple documents at various compression rates, using a battery of algorithms. The summarizing draws on a taxonomy of sentence-by-sentence informational relationships between documents in clusters of related documents. For example, MEAD summaries are based on textual indicators of cross-structural informational relationships such as identity, contradiction, attribution, change of perspective, and fulfillment of a prediction. This taxonomy of informational relationships is derived from the Cross-Document Structure Theory (CST) proposed by MEAD’s creators.

Merged into MEAD are methods for evaluating summary quality based on human-human, human-computer, and computer-computer agreement. Methods include co-selection, e.g. precision/recall and kappa, a measure of interjudge agreements in relation to the difficulty of the problem and content-based measures, e.g. cosine (similarity to a human summary) and word overlap considering number words in common.

Intended Audience

Analysts