Skip to content

Instantly share code, notes, and snippets.

@victusfate
Forked from igrigorik/summarizer.rb
Created December 6, 2010 18:34
Show Gist options
  • Save victusfate/730703 to your computer and use it in GitHub Desktop.
Save victusfate/730703 to your computer and use it in GitHub Desktop.
ruby-1.9.2-p0 > require 'postrank-api'
ruby-1.9.2-p0 > pr = PostRank::API.new('ig')
ruby-1.9.2-p0 > pr.feed(pr.feed_info('igvita.com')['id'], :num => 5)['items'].map {|i| i['content'].gsub(/<\/?[^>]*>/, "").summarize}
[
[0] "The world of concurrent computation is a complicated one. Hence, not surprisingly, when Bruce Tate asked Matz, in an interview for his recent book (Seven Languages in Seven Weeks) for a feature that he would like to change in Ruby if he could go back in time, the answer was telling: \xE2\x80\x9CI would remove the thread and add actors or some other more advanced concurrency features\xE2\x80\x9D.\nProcess Calculi & Advanced Concurrency\nIt is easy to read a lot into Matz's statement, but the follow-up question is: more advanced concurrency features? Process calculi is the formal name for the study of many related approaches of modeling the behavior of concurrent systems, which provides many alternatives: CCS, CSP, ACP, and Actor models just to name a few.\nActors, CSP and Pi-calculus\nThe actor concurrency model, which is now gaining traction thanks to the recent success of the languages such as Erlang and Scala is a great example of an \xE2\x80\x9Calternative concurrency model\xE2\x80\x9D that is worth exploring. Instead of protecting our data structures by locks, and then contending to acquire the lock, this model encourages us to explicitly pass state from process to process. So, the question is not whether threads need to exist, but rather, whether they actually make for the best high-level interface to write, test, and manage code that requires concurrency, regardless of runtime.",
[1] "ZeroMQ sockets provide message-oriented messaging, support for multiple transports, transparent setup and teardown, and an entire array of routing patterns via different socket types - see quick refresher on ZeroMQ.\nRouting Devices: Queue, Streamer, Forwarder\nIf RabbitMQ is a specific message-broker implementation that codifies a set of allowed messaging patterns, then ZeroMQ is the low-level toolkit, which allows us to assemble these patterns at will.\nAssembling Custom ZMQ Devices\nWhile the devices shipped with ZeroMQ cover some of the most common use cases, the real power of the framework is in its flexibility to allow us to assemble arbitrary messaging patterns: mixing different socket types, defining custom routing strategies based on meta-data of request, and so on.\n\n\n \n \n zdevice (Ruby DSL for assembling ZeroMQ routing devices)\n Downloads: 1 File Size: 0.0 KB \n \nIn the example above, we used ZDCF\xE2\x80\x99s JSON configuration syntax to open all the socket connections, and then defined a simple relay routing strategy in our start block. So, if you are looking for a ZeroMQ device capable of handling thousands of connections, with support for persistence, and a number of other features, then the combination of ZeroMQ and RabbitMQ is definitely worth a close look.",
[2] "If you have ever had the need to add full-text indexing or search capability to one of your projects, chances are you will be familiar with Apache Lucene or one of its many derivatives such as PyLucene, Lucene.NET, Ferret (Ruby), or Lucy (C port). Best of all, the number of projects around it, as well as the planned core improvements continues to impress - if you are looking for an open source search solution, Lucene is definitely worth a close look. Salesforce started with Lucene back in 2002 and today manages an 8TB+ index (~20 billion documents).\nLucene + HTTP: Solr Server\nIf Lucene is a low-level IR toolkit, then Solr is the fully-featured HTTP search server which wraps the Lucene library and adds a number of additional features: additional query parsers, HTTP caching, search faceting, highlighting, and many others. Solr and Lucene began as independent projects, but just this past year both teams have decided to merge their efforts - all around, great news for both communities.\nReal-time Search with Lucene\nReal-time search was a big theme at Lucene Revolution. Unlike many other IR toolkits, Lucene has always supported incremental index updates, but unfortunately it also required an fsync (flush new documents from memory to disk) and a reopen of the \"index reader\" to make those documents visible to the incoming requests. To achieve this, all Lucene indexes are maintained in memory in many small segments (up to 16 million tweets per segment) and are heavily optimized for Twitter's small document structure.\nDistributed Search\nOut of the box, Lucene does not provide any support for distributed indexes - your application can open multiple index readers, but all of that has to be coordinated manually.\nSolr, Lucene and NoSQL\nInstead of running Lucene or Solr in standalone mode, both are also easily integrated within other applications. Unlike Lucandra, Lily is not leveraging HBase as an index store (see HBasene for that), but runs standalone, albeit tightly integrated Solr servers for flexible indexing and query support.",
[3] " But, for a second, imagine if instead of rushing to build mobile apps which pull data off the web to local devices, what if we also had the infrastructure that could efficiently push the data back to the web? Not to mention, the capability to aggregate data from thousands of mobile devices for trends analysis, data-mining applications and so forth - a global mobile sensor network at your disposal! WebHooks allow us to establish callback (push) semantics between web-services, and PubSubHubbub solves the problem of efficiently delivering real-time notifications from a single publisher (mobile device, in this case) to many subscribers: the phone pushes a single update to the platform provider and the PSHB hub does all the hard work of distributing the individual updates to each subscriber. If we could efficiently aggregate activity feeds from thousands of mobile subscribers with rich geo and contextual meta-data (user, or device generated), then imagine all the numerous mash-ups and data-mining applications that could be built on top!",
[4] "Berkeley Sockets (BSD) are the de facto API for all network communication. This is exactly where the ZeroMQ (\xC3\x98MQ/ZMQ) networking library comes in: \"it gives you sockets that carry whole messages across various transports like inproc, IPC, TCP, and multicast; Streams & Datagrams\nZeroMQ sockets provide a layer of abstraction on top of the traditional socket API, which allows it to hide much of the everyday boilerplate complexity we are forced to repeat in our applications. This means that if a client socket sends a 150kb message, then the server socket will receive a complete, identical message on the other end without having to implement any explicit buffering or framing.\nTransport Agnostic Sockets\nZeroMQ sockets are also transport agnostic: there is a single, unified API for sending and receiving messages across all protocols.\nRouting & Topology Aware Sockets\nZeroMQ sockets are routing and network topology aware. Since we don't have to explicitly manage the peer-to-peer connection state - all of that is abstracted by the library, as we saw above - nothing stops a single ZeroMQ socket from binding to two distinct ports to listen to for inbound requests, or in reverse, send data to two distinct sockets via a single API call.\n\nIn the case of a Publish/Subscribe socket pair (unidirectional communication from publisher to subscribers), the publisher socket will replicate the message to all connected clients (local IPC clients, remote TCP listeners, etc). In the case of a Request/Reply socket pair (bi-directional communication: server, client), the messages will be automatically load balanced by the socket generating the request to one of the connected clients. Of course, you can also control the queuing behavior of ZeroMQ sockets by setting an allowed memory bound and even a swap size for each socket. The handlers, in turn, process the incoming requests (via Pull socket) and publish them to a \"Pub\" socket, to which the Mongrel2 server itself is subscribed to and is listening for its process ID (via a topic filter)."
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment