How @stocks_in works...

While pondering over what topic to choose for writing, I came across thought why not share how @stocks_in twitter bot works.

As you know, news links are fetched from Google News but it's not simple as posting rss feed to the twitter. There are 100s of news articles in Business Topic of Google News daily, many duplicate articles as well. Posting everything will clutter your timeline and posting same news doesn't make sense.

To avoid duplication issue, news from only business websites are allowed. Sites like Moneycontrol, EconomicTimes or Livemint are allowed while sites like timesofindia, thehindu or ndtv. Reason being, if there is big move in stock market, all news sites will try to cover about stock market that day and bot will keep posting same news but from different sites. Not allowing some sites solved the issue to the some extent but still bot used to pick same news from business sites.

So, to avoid this, 'fuzzy string match' is applied on headlines of last 40 news articles posted on bot.

('fuzzy string match' in English = instead of comparing two sentences word by word, we try to compare how much two sentences are similar. Same sentences will give 100% match, on the other hand sentence with misspelt word or different word will give lower than 100% match)

So, Any article with headline having more than 50% match to any of the previous 40 news is not posted.

No comments:

Post a Comment