Summarizer is an artificial intelligence newspaper
Introduction.
In this age of globalization, time is of the essence for everyone. At this time that digital currency has been introduced to reduce transaction time. The cryptocurrency was originally known as a payment system that allows people to be able to make transactions very fast, without third parties, in a crystalline, secure, and faceless manner. With the accrual and development of the Crypto or blockchain ecosystem, several alternative investment opportunities have flourished, and have proven to be more efficient and profitable investment tools than traditional financial returns. Cryptocurrency is potentially the largest digital asset for investment. because it is user-friendly, secure, and allows to cut down steep transaction costs. It is the maximum thing in the financial market that has proven to be an interruption lifter in financial transactions worldwide. Leveraging blockchain technology, cryptocurrency has managed to set up a decentralized, transparent, and inaccessible accountable system.
It is a fully automated newspaper.
Summarizer aims to make your daily news shorter by utilizing AI. Its bots crawl the web for news, summarize them, and then sort them into categories.
What is SUMMARIZER ?
Summarizer is exclusive to $SMR holders. You won't have to pay anything, just simply holding $SMR to read Summarizer contents. At any time, you decide to stop reading Summarizer, you can just sell your $SMR back to the market.
The algorithm we use
TextRank is an unsupervised algorithm for the automated summarization of texts that can also be used to obtain the most important keywords in a document. The algorithm applies a variation of PageRank over a graph constructed specifically for the task of summarization. This produces a ranking of the elements in the graph: the most important elements are the ones that better describe the text. This approach allows TextRank to build summaries without the need of a training corpus or labeling and allows the use of the algorithm with different languages.
For the task of automated summarization, TextRank models any document as a graph using sentences as nodes . A function to compute the similarity of sentences is needed to build edges in between. This function is used to weight the graph edges, the higher the similarity between sentences the more important the edge between them will be in the graph. In the domain of a Random Walker, as used frequently in PageRank , we can say that we are more likely to go from one sentence to another if they are very similar.
TextRank determines the relation of similarity between two sentences based on the content that both share. This overlap is calculated simply as the number of common lexical tokens between them, divided by the length of each to avoid promoting long sentences.
The function featured in the original algorithm can be formalized as:
Definition 1. Given Si , Sj two sentences represented by a set of n words that in Si are represented as Si = wi , wi , …, wi . The similarity function for Si, Sj can be defined as:
The result of this process is a dense graph representing the document. From this graph, PageRank is used to compute the importance of each vertex. The most significant sentences are selected and presented in the same order as they appear in the document as the summary. These ideas are based in changing the way in which distances between sentences are computed to weight the edges of the graph used for PageRank. These similarity measures are orthogonal to the TextRank model, thus they can be easily integrated into the algorithm. We found some of these variations to produce significant improvements over the original algorithm.
BM25 BM25 / Okapi-BM25 is a ranking function widely used as the state of the art for Information Retrieval tasks. BM25 is a variation of the TF-IDF model using a probabilistic model .
Definition 2. Given two sentences R, S, BM25 is defined as:
where k and b are parameters. We used k = 1.2 and b = 0.75. avgDL is the average length of the sentences in our collection.
This function definition implies that if a word appears in more than half the documents of the collection, it will have a negative value. Since this can cause problems in the next stage of the algorithm, we used the following correction formula:
where takes a value between 0.5 and 0.30 and avgIDF is the average IDF for all terms. Other corrective strategies were also tested, setting = 0 and using simpler modifications of the classic IDF formula.
Evaluation
We tested LCS, Cosine Sim, BM25 and BM25+ as different ways to weight the edges for the TextRank graph. The best results were obtained using BM25 and BM25+ with the corrective formula shown in equation 3. We achieved.
Summarizer Team
Brandon Thomas – Frontend Developer
Chris Miller – Blockchain Developer
Joy Stewart – Communications Manager
Julie Hardin – Marketing Manager
Mike Cook – Graphic Designer
Robert Hoover – Backend Developer
Steve Willis – Software Engineer
Useful links to the project:
The official website – https://summarizer.co/
Tokenomics website – https://token.summarizer.co/
Telegram – https://t.me/SummarizerOfficial
Twitter – https://twitter.com/SummarizerC
Medium – https://medium.com/@sum
Profile: https://bitcointalk.org/index.php?action=profile;u=1522325;sa=summary
ETh:0xa886F31d706CC107eF112D5A648820eBe075d104
Comments
Post a Comment