Skip to content

Instantly share code, notes, and snippets.

@ivan
Last active November 2, 2024 18:42
Show Gist options
  • Save ivan/a36e2489623469d96c1ad79077b6dcf9 to your computer and use it in GitHub Desktop.
Save ivan/a36e2489623469d96c1ad79077b6dcf9 to your computer and use it in GitHub Desktop.
2024 reading list

Things I might read in 2024.



  • Antoine de Saint-Exupéry, Richard Howard (translator) - The Little Prince
  • (Translation by) Sam Hamill - Yellow River: Three Hundred Poems From the Chinese
  • Sayaka Murata, Ginny Tapley Takemori (translator) - Convenience Store Woman (via)
  • Jorge Luis Borges - Tlön, Uqbar, Orbis Tertius (in Labyrinths)/ printed (via)
  • Franz Kafka - The Metamorphosis (via)
  • William Olaf Stapledon - Star Maker/ audio, go to 12m35s to skip past the introduction spoilers

  • The Heart of Innovation: A Field Guide for Navigating to Authentic Demand/ audio (via)
  • Peter D. Kaufman - Poor Charlie's Almanack: The Wit and Wisdom of Charles T. Munger, Expanded Third Edition
  • Lia A. DiBello - Expertise in Business: Evolving with a Changing World (in The Oxford Handbook of Expertise) (via)
  • Joël Glenn Brenner - The Emperors of Chocolate: Inside the Secret World of Hershey and Mars
  • Elad Gil - High Growth Handbook/ audio
  • W. Edwards Demming - The New Economics for Industry, Government, Education/ audio
  • W. Edwards Demming - The New Economics for Industry, Government, Education/ the PDF or ebook
  • Henrik Karlsson - Escaping Flatland/ including the posts I SingleFile'd
  • the relevant-looking posts on benkuhn.net/posts
  • Commoncog Case Library Beta
  • Keith J. Cunningham - The Road Less Stupid: Advice from the Chairman of the Board/ audio
  • Keith J. Cunningham - The 4-Day MBA/ video
  • Cedric Chin's summary of 7 Powers
  • Akio Morita, Edwin M. Reingold, Mitsuko Shimomura - Made in Japan: Akio Morita and Sony
  • Nomad Investment Partnership Letters or redacted (via)
  • How to Lose Money in Derivatives: Examples From Hedge Funds and Bank Trading Departments
  • Brian Hayes - Infrastructure: A Guide to the Industrial Landscape
  • Accelerated Expertise (via)/ printed, "read Chapters 9-13 and skim everything else"
  • David J. Gerber - The Inventor's Dilemma (via Oxide and Friends)
  • Alex Komoroske - The Compendium / after I convert the Firebase export in code/websites/compendium-cards-data/db.json to a single HTML page
  • Rich Cohen - The Fish That Ate The Whale (via)
  • Bob Caspe - Entrepreneurial Action/ printed, skim for anything I don't know



Interactive fiction


unplanned notable things read


unplanned and abandoned

  • Ichiro Kishimi, Fumitake Koga - The Courage to Be Disliked/ audio
  • Matt Dinniman - Dungeon Crawler Carl/ audio
  • Charles Eisenstein - The More Beautiful World Our Hearts Know Is Possible/ audio
  • Geoff Smart - Who: The A Method for Hiring/ audio
  • Genki Kawamura - If Cats Disappeared from the World/ audio
  • Paul Stamets - Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet/ audio
@ivan
Copy link
Author

ivan commented Jul 24, 2024

Swelling and bulging in particular are telling signs that the food inside has begun to go bad and is causing the abnormal shape. If you see a bloated can, it's usually just a safer bet to not open the can at all and just throw it out; but if you're still unsure, you can open the can and inspect the insides. Signs of spoilage will vary depending on the actual contents of the can but in general, discoloration, abnormal growths of some kind (like mold), and foul odors are indications of spoilage.

What causes the swelling

Microbial spoilage and the hydrogen gas it produces — which occurs through the interaction between the metal of the can and acids from the contents — are the principal culprits for a bloated can, per the FDA. There are also microbes that do not release gas but can still render the food inside the can inedible; thereby, they may not cause swelling but still can cause spoilage.

According to the U.S. Department of Agriculture, as a rule, it is best to never consume food in a bloated can as the likelihood of spoilage is quite high. Likewise cans that are leaking, badly dented, or have loose lids should also be avoided. These cans run the particularly dangerous risk of containing Clostridium botulinum, which can be fatal even in small quantities; for this reason, you should never attempt to taste canned food to test whether or not it is safe. If you see a bloated can, it is just safest to dispose of it promptly.

https://www.tastingtable.com/863613/is-it-actually-dangerous-to-eat-food-from-a-bloated-can/

You can tell by inspecting the can. If it’s contaminated then organisms will grow inside and produce gas, which puts pressure on the inside. It will look like someone blew up a balloon a little too much.

If there is a sign like this, do not eat the contents of the can, you are highly likely to become very ill or worse.

Edit: as mentioned below, you should not even open a can if it’s inflated, as it could be contaminated with botulism. Botulism is a poisonous substance some bacteria produces - it’s possible to breath it in and receive a dose of botulism, so don’t even open a can that shows signs of damage or contamination.

https://old.reddit.com/r/Damnthatsinteresting/comments/1eay4ge/opening_a_17_year_old_can_of_corn/leouqcd/

@ivan
Copy link
Author

ivan commented Jul 24, 2024

Obstinacy is a simple thing. Animals have it. But persistence turns out to have a fairly complicated internal structure.

One thing that distinguishes the persistent is their energy. At the risk of putting too much weight on words, they persist rather than merely resisting. They keep trying things. Which means the persistent must also be imaginative. To keep trying things, you have to keep thinking of things to try.

Energy and imagination make a wonderful combination. Each gets the best out of the other. Energy creates demand for the ideas produced by imagination, which thus produces more, and imagination gives energy somewhere to go. [5]

Merely having energy and imagination is quite rare. But to solve hard problems you need three more qualities: resilience, good judgement, and a focus on some kind of goal.

Resilience means not having one's morale destroyed by setbacks. Setbacks are inevitable once problems reach a certain size, so if you can't bounce back from them, you can only do good work on a small scale. But resilience is not the same as obstinacy. Resilience means setbacks can't change your morale, not that they can't change your mind.

[...]

When you look at the internal structure of persistence, it doesn't resemble obstinacy at all. It's so much more complex. Five distinct qualities — energy, imagination, resilience, good judgement, and focus on a goal — combine to produce a phenomenon that seems a bit like obstinacy in the sense that it causes you not to give up. But the way you don't give up is completely different. Instead of merely resisting change, you're driven toward a goal by energy and resilience, through paths discovered by imagination and optimized by judgement. You'll give way on any point low down in the decision tree, if its expected value drops sufficiently, but energy and resilience keep pushing you toward whatever you chose higher up.

https://paulgraham.com/persistence.html

@ivan
Copy link
Author

ivan commented Jul 24, 2024

Mr. Big (sometimes known as the Canadian technique) is a covert investigation procedure used by undercover police to elicit confessions from suspects in cold cases (usually murder). Police officers create a fictitious grey area or criminal organization and then seduce the suspect into joining it. They build a relationship with the suspect, gain their confidence, and then enlist their help in a succession of criminal acts (e.g., delivering goods, credit card scams, selling guns) for which they are paid. Once the suspect has become enmeshed in the criminal gang, they are persuaded to divulge information about their criminal history, usually as a prerequisite for being accepted as a member of the organization.

[...]

The use of this technique is essentially prohibited in some countries, including the United Kingdom[4] and the United States.[5] In Germany, which has high standards for what constitutes a voluntary confession, it may be more difficult to use confessions obtained by this technique.[5] The procedure has been used by police in Australia[6] and New Zealand,[7] and its use has been upheld by courts in both countries.

https://en.wikipedia.org/wiki/Mr._Big_(police_procedure)

@ivan
Copy link
Author

ivan commented Jul 24, 2024

The human plane of language becomes a zone of shelter, of reassurance. The cold plane of the posthuman is the one in which a formal axiomatic logic, which is not yet a language, unfolds like Von Neumann automata, “tiling the world”. The technologist is able to work out the steps of what is possible next, although it is not possible to communicate it without becoming alien. The technologist does not think about what he will be able to convince other people regarding, he thinks about what marching dolls he is able to wind up. The language of code is different from the language of speech because the language of code is not addressed to another. The language of code does not “care if it is read”, though it is read, by robots and scanners. The language of code is like the mute language of DNA — the original paradigm of a “code” — read by ribosomes.

https://angelicism.substack.com/p/technologists

@ivan
Copy link
Author

ivan commented Jul 24, 2024

One thing is universal among these people: the sense that the left has been betrayed. There is some loosely assented to project among these folks to create socialism or something like that, and Nikki and Anna have strayed too far from it. This betrayal happens over and over, to these weary philosophers — you think someone is a fellow leftist, cheers, next round’s on me — but they turn out to be a “grifter” — this is the eternal woe of the leftist; that someone would become interested in making money rather than having pointless debates all day. It is universally assented to on “philosophy twitter” that it is very important to figure out how to create socialism by synthesizing Marx, Hegel, Deleuze, Althusser, and maybe psychoanalysis. No one has figured out the correct way to put all these thinkers in dialogue but when they do, socialism will happen. Every user on philosophy twitter has their own incompatible sense of the true leftist tradition which they will brashly defend against the others, accusing the others of being reactionary for defecting against the half-formed vision of the true doctrine they assemble in their mind.

The shining light of Anna has come to explain to these chattering fools what should be profoundly obvious. Nowhere in the world does these philosophers’ “leftism” exist. But they presuppose that it must. Their leftism is a kind of spirit of justice which must, axiomatically, always be held to in thought and action even as it is trampled again and again in struggle, even as its letter becomes more and more undefinable, even as it becomes a form of nostalgia for a struggle which only once was. Thus their leftism is a basic religious commitment (wokeness is like a religion). What Anna challenges them to do is make it self-conscious as a religious commitment, for it has no other coherent meaning.

https://angelicism.substack.com/p/the-left-is-humiliated-and-conquered

@ivan
Copy link
Author

ivan commented Jul 25, 2024

The benefit of journaling is not just reentry, but that you begin to solidify the mental model into a concrete branching of possibilities that is tightly coupled to the specific problem. Your work becomes traversal and mutation of this tree. Several benefits accrue: you begin to see gaps in the tree, and can fill them in. You begin to have confidence in your mental model, recovering the time you used to spend going over the same nodes again and again in a haphazard way. In distributed systems in particular, the work is often detailed, manual, error prone and high latency - with a solid mental model you can get through a checklist of steps with minimum difficulty and high confidence that you didn't miss anything. This ability to take something abstract and make it more concrete on the fly is a critical skill.

Perhaps the greatest barrier to using it is akin to envy. We see others who apparently do this without written materials, in their head. I think we see this as evidence of intellectual superiority and harbor the doubt that using an aid like a journal means we are somehow lacking in skill or ability. This is wrong. Using an aid to map out complex problems isn't a failure, it's essential, especially for problems in systems you've never used before. Over time you may yourself build up your expertise such that you no longer need the aid, but that doesn't signal anything about your intelligence or ability either, only your experience.

https://news.ycombinator.com/item?id=40950584

@ivan
Copy link
Author

ivan commented Jul 25, 2024

What finally got this to stick for me was abandoning all notion of structure and organization (and formal concepts like “logging” and “journaling”) and optimizing fully for capture over retrieval, then relying on search tools and proximity for the latter.

[...]

It took me a couple years to realize this too. For the past five years I abandoned all structure. I use a literal log file. Chronological from top to bottom, with paragraph breaks for each workday. Higher than necessary verbosity, no points taken away for spelling or grammar mistakes.

https://news.ycombinator.com/item?id=40950584

@ivan
Copy link
Author

ivan commented Jul 25, 2024

I cannot work without taking notes. It is a process of thinking, sorting my ideas, documenting steps and outcomes, pausing, practicing meta-cognition, gaining clarity and confidence along the process. Plus I have the benefit to go back to my notes and have instant access to what I did days, months, years ago. So I can’t understand how people are working without taking notes, documenting (for themselves), and journaling.

https://news.ycombinator.com/item?id=40952344

@ivan
Copy link
Author

ivan commented Jul 25, 2024

we will approach all methods in terms of their intended effects—that is, what they were designed to do and what happens when you engage with them

https://vajrayananow.com/about-my-approach

@ivan
Copy link
Author

ivan commented Jul 25, 2024

We tried to give away the technology, and it didn’t work! So now we are selling it, with considerably more success.

https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html

@ivan
Copy link
Author

ivan commented Jul 25, 2024

The major omission in this article is fuzzing. Not only is it practical and in wide (and growing use), it’s also far more advanced than QuickCheck’s approach of generating random inputs, because fuzzing can be _coverage-driven_. Property-based testing came out of academia and fuzzing came out of security research, initially they were not connected. But with the advent of in-process fuzzing (through libFuzzer), which encourages writing small fuzz tests rather than testing entire programs; and structure-aware fuzzing, which enables testing more than just functions that take a bytestring as input, in my view the two techniques have converged. It’s just that the two separate communities haven’t fully realized this yet.

One pitfall with non-coverage-driven randomized testing like QuickCheck, is that how good your tests are depends a lot on the generator. It may be very rarely generating interesting inputs because you biased the generator in the wrong way, and depending on how you do the generation, you need to be careful to ensure the generator halts. With coverage-driven fuzzing all of these problems go away; you don’t have to be smart to choose distributions so that interesting cases are more common, coverage instrumentation will automatically discover new paths in your program and drill down on them.

But isn’t fuzzing about feeding a large program or function random bytestrings as inputs, whereas property-based testing is about testing properties about data structures? It is true that fuzzers operate on bytestrings, but there is no rule that says we can’t use that bytestring to generate a data structure (in a sense, replacing the role of the random seed). And indeed this is what the Arbitrary crate [1] in Rust does, it gives tools and even derive macros to automatically generate values of your data types in the same way that QuickCheck can. The fuzzing community calls this Structure-Aware Fuzzing and there is a chapter about it in the Rust fuzzing book [2]. There are also tools like libprotobuf-mutator [3] that substitute fuzzers’ naive mutation strategies, but even with naive strategies fuzzers can usually get to 100% coverage with appropriate measures (e.g. recomputing checksums after mutation, if the data structure contains checksums).

I am using this extensively in my own projects. For example, RCL (a configuration language that is a superset of json) contains multiple fuzzers that test various properties [4], such as idempotency of the formatter. In the beginning it used the raw source files as inputs but I also added a more advanced generator that wastes less cycles on inputs that get rejected by the parser. The fuzzer has caught serveral bugs, and most of them would have had no hope of being found with naive randomized testing, because they required a cascade of unlikely events.

Structure-aware fuzzing is not limited to generating data structures either, you can use it to generate reified commands to test a stateful API, as you describe in the _Stateful property-based testing_ section. The Rust fuzzing book has an example of this [5], and I use this approach to fuzz a tree implementation in Noblit [6].

[1]: https://docs.rs/arbitrary/latest/arbitrary/
[2]: https://rust-fuzz.github.io/book/cargo-fuzz/structure-aware-fuzzing.html
[3]: https://github.com/google/libprotobuf-mutator
[4]: https://docs.ruuda.nl/rcl/testing/#fuzz-tests
[5]: https://rust-fuzz.github.io/book/cargo-fuzz/structure-aware-fuzzing.html#example-2-fuzzing-allocator-api-calls
[6]: https://github.com/ruuda/noblit/blob/a0fd1342c4aa6e05f2b1c4e2929804c82e348ae2/fuzz/fuzz_targets/htree_insert.rs

https://news.ycombinator.com/item?id=40877028

@ivan
Copy link
Author

ivan commented Jul 25, 2024

The following instructions do not require any plugins or addons, which is a bonus.

  1. Find your profile folder.
  2. Navigate to the subfolder 'chrome'. If it doesn't exist, create it.
  3. Now inside that folder, create an empty file called 'userContent.css'.
  4. Open that file in a text editor, and add this text (all on one line): @font-face { font-family: 'Helvetica'; src: local('Arial'); }
  5. Save that file and restart firefox.

https://stackoverflow.com/questions/1210975/how-do-i-block-or-substitute-a-certain-font-in-firefox

@ivan
Copy link
Author

ivan commented Jul 25, 2024

Hydration works differently

Svelte 5 makes use of comments during server side rendering which are used for more robust and efficient hydration on the client. As such, you shouldn't remove comments from your HTML output if you intend to hydrate it, and if you manually authored HTML to be hydrated by a Svelte component, you need to adjust that HTML to include said comments at the correct positions.

sveltejs/svelte@528d346

This just cost us about two days of three people debugging.

==> Removing whitespace and comments will brake Vue’s hydration strategy when using server side rendering.

https://community.cloudflare.com/t/omit-formatted-comments-from-minification/18572/28?page=2

@ivan
Copy link
Author

ivan commented Jul 25, 2024

Wikipedia trench warfare is an elaborate game, opaque and bizarre for outsiders to even contemplate, in which motivated figures fight to exhaustion over often trivial-seeming changes with deep significance to participants. Given that, I’ll expend my last remaining bit of sanity to bring legibility to a few of Gerard’s skirmishes. When Gerard fixates on something within an article, he touches it up via a series of gradual, mild tweaks: often individually defensible, usually citing one policy or another, all pointing one direction. He removes neutral information tangential to his fixation, gradually expands and adds citations to the sections he fixates on, and aggressively reverts any change that goes against his vision. When challenged, he raises policy names, invites editors to escalate, requests hard proof for straightforward claims he knows are true, accuses opponents of being fringe conspiracists, and if all else fails, simply goes silent and waits for people to shift their focus before returning to what he wanted to do in the first place.

https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin

@ivan
Copy link
Author

ivan commented Jul 25, 2024

If China wanted to make the yuan a true global reserve currency, they would need to embrace massive financial deregulation and abolish their currently strict capital controls, in order to allow massive inflows of foreign held currency and yuan into China. But China needs to maintain its strict financial regulation for domestic economic success, and political stability. China is unlikely to ever decide to abandon the statist model it has followed for decades just to make itself a better hub for the international financial system.

https://keithwoods.pub/p/the-shadow-money-system-that-rules

@ivan
Copy link
Author

ivan commented Jul 25, 2024

[Cloudflare is] now processing an average of 57 million HTTP requests/second (+23.9% YoY) and 77 million at peak (+22.2% YoY). From a DNS perspective, we are handling 35 million DNS queries per second (+40% YoY).

https://blog.cloudflare.com/application-security-report-2024-update

@ivan
Copy link
Author

ivan commented Jul 25, 2024

The good thing about open source software is that it’s reusable intellectual property. Consultants can work for one company using it, then move on, and use it to do consulting for someone else. Open source software is great because talented engineers can work where people can see them, on something they can show, and something they can keep.

https://news.alvaroduran.com/p/we-love-writing-software-so-much

@ivan
Copy link
Author

ivan commented Jul 26, 2024

I think the ever-evolving nature of conspiracies is actually pretty important to psychologically grasping their appeal. I have a friend who is a big believer in 9/11 Trutherism. He once compelled me to watch the documentary “The New Pearl Harbor,” an exhausting 5-hour film promoting 9/11 conspiracies. If one actually watches, one quickly discovers that a lot of 9/11 conspiracy theories are mutually exclusive, or at least don’t mesh well together: One conspiracy argues that fighter jets were intentionally diverted the wrong direction to keep them from shooting down the hijacked jets approaching New York, while another conspiracy suggests that United 93 was shot down, and it was all covered up. In some versions, the planes didn’t hit the Twin Towers at all. Sometimes Bush did it, and sometimes Israel did it, and so on. 

Similarly, in my career I’ve worked adjacent to people who, like RRN, were very hostile to Covid-19 shots. That hostility made them sequentially endorse wildly different assertions about how the vaccines worked. Sometimes, the vaccines contain heavy metals. Sometimes, they contain hydra DNA to turn recipients into partially non-human chimeras. Sometimes, the vaccines are a depopulation agent. Sometimes, they’re a mind-control agent, or a killswitch that can be activated by self-assembling nanomachinery. One viral documentary in 2022 claimed that Covid was caused by snake venom in the water supply, and that Covid vaccines were an additional dose of snake venom to keep people sick (all this, of course, because the snake is Satan’s animal).

What stands out isn’t the silliness of these particular theories, but that I saw them sequentially endorsed by the same people.

Some of these people are smart enough to notice inconsistencies, at least when they’re pointed out, so why don’t they bother them? To some extent, I think it’s for the same reason people don’t care that every Batman story doesn’t perfectly line up. Consistency isn’t the point! What actually matters is enjoying individual stories and the wider genre they fit into. Covid vaccine haters don’t think too hard about any specific story. Instead, they’re driven by a core impulse of “distrust the new vaccine that people I distrust are promoting,” and every conceivably story or tale that feeds that genre of thought is, for them, worthwhile.

Similarly, Real Raw News fans don’t think too hard about any specific story. Instead, I think their core impulse is, ironically, profound disappointment in how the Trump administration failed to deliver. Trump shook up the American political landscape more than anyone in living memory, and promised sweeping changes to every level of American government, yet his actual administration proved rather disorderly, changed far less than was promised, and then lost power after one term. For many, this simply prompted a revision in how they saw Trump. But for others, the preferred response is to embrace a fantasy reality where Trump is a superhero.

[...]

Imagine you are an ordinary, mildly engaged American citizen. You live far from the halls of power, you work an ordinary job, and whatever your feelings on political issues, you rarely see elections translate in a clear way to your own daily life. You might be interested in Washington, but Washington really isn’t that interested in you. 

Online, the world throws a million potential narratives at you. In some of them, the world is a confusing mess of moral gray areas. In others, the people you care about are winning. But in some narratives, you’re the hero, the people you like do good things, and the bad guys get what they deserve. The superficial evidence for all of these narratives is about equally convincing, at a glance. Look outside, and it’s hard to see the impact of any of the stories. Your entire understanding of reality is mediated through what sites you choose to read and what videos you choose to watch. As a politically marginal person, it won’t matter what you as an individual choose to believe. 

So, what happens if you choose to believe the story you find most enjoyable? And what if millions of others choose the same?

https://www.astralcodexten.com/p/your-book-review-real-raw-news

@ivan
Copy link
Author

ivan commented Jul 26, 2024

There is no technical improvement good enough to compensate for poor usability. If the App takes 0.5ms to do something but I have to go to 3 different screens, and another takes 2ms but one button is enough, in everyday use it will be faster, even if it is less efficient.

We have a Ferrari under the hood, but without electric steering, with manual windows and old brake pads.

https://old.reddit.com/r/OvercastFm/comments/1ecra74/the_official_better_than_ever_thread_list_all_the/

#ux

@ivan
Copy link
Author

ivan commented Jul 28, 2024

salon-style spaces for people to talk about important issues and difficult ideas while resisting their reflexive reduction to existing ideological oppositions

https://partiful.com/e/pmrQzkZEienFcvzwJJ9q

@ivan
Copy link
Author

ivan commented Jul 29, 2024

Ask HN: Strategies to Reduce AI Hallucinations?

Feed it grounding text that's about as long as the output text you expect it to produce.

They are called transformers for a reason.

https://news.ycombinator.com/item?id=41055736

@ivan
Copy link
Author

ivan commented Jul 30, 2024

As an Essentialist, distilling, organizing and simplifying is your call. It doesn't matter where you go, whether at work or home or on vacation (or a restaurant, store, experience, etc), you see chaos, mess, complexity and it triggers a near-primal urge to create order and simplicity.

It could be complex information, ideas, spreadsheets or data-sets, toys in a room, items in a display, clothes on a rack, books on a shelf, physical or digital, it doesn't really matter. Your brain immediately goes into distill and simplify mode. You think in systems and processes designed to create space, order and efficiency. It's what breathes you.

This is one of the Sparketypes that often expresses itself very early in life. Unlike others where you need to "go out into the world" to experience activities and moments that give you the raw data to really get a sense for what lights you up, nearly every moment of every day provides opportunities for Essentialists to embrace and employ their Sparketype. Because, it turns out, the world is a largely chaotic, complex and massively disorganized place.

@ivan
Copy link
Author

ivan commented Jul 30, 2024

Attachment as a driver of work will almost always produce good results. Unfortunately it may not produce useful results, or results that society wants. The Artist that aligns with society is rare.

https://www.eristicstest.com/the_artist/

@ivan
Copy link
Author

ivan commented Jul 31, 2024

  • If people have a good “user experience” when they interact with you, then they will want to interact with you more in the future.

https://dynomight.substack.com/p/advice

@ivan
Copy link
Author

ivan commented Aug 2, 2024

Ask HN: What is the best software to visualize a graph with a billion nodes?

Visualizing large graphs is a natural desire for people with lots of connected data. But after a fairly small size, there's almost no utility in visualizing graphs. It's much more useful to compute various measures on the graph, and then query the graph using some combination of node/edge values and these computed values. You might subset out the nodes and edges of particular interest if you really want to see them -- or don't visualize at all and just inspect the graph nodes and edges very locally with some kind of tabular data viewer.

It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure.

There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations.

There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction.

https://news.ycombinator.com/item?id=41132095

@ivan
Copy link
Author

ivan commented Aug 4, 2024

Intellectual property is absolutely a moat, but it's not the "Intellectual Property" as defined in law. It's the implicit, tacit knowledge & what exists in the collective mind of the team. But it's still property of a sort, the shareholders have a beneficial claim to it.

https://x.com/J_L_Colvin/status/1820119713654984913

@ivan
Copy link
Author

ivan commented Aug 4, 2024

Reinventing the wheel is a terrible metaphor in a lot of cases. I understand why it exists, but it seems to discourage trying something by yourself the first time. There is a lot of value in rediscovering something because you understand it a lot more, and in many contexts, that personal discovery is key.

https://x.com/inkolore_/status/1818739846774768064

@ivan
Copy link
Author

ivan commented Aug 4, 2024

@ivan
Copy link
Author

ivan commented Aug 5, 2024

Wow not a single mention of Whisper this entire comment first page! I think Whisper is really cool: the large model can pull speech out of even heavily distorted (wind noise, clipping, etc) audio. I have a story to illustrate why running Whisper on your own locally is not so easy! Much easier to sign up to the OpenAI API.

In my research I found that actually pre-processing the audio to reduce noise (using the IMO best-in-class FB research "denoiser") actually increases WER. This was surprising! From a human perspective, I assumed bringing up the "signal" would increase accuracy. But it seems that, from a machine perspective, there's actually "information" to be gleaned from the heavily distorted noise part of the signal. To me, this is amazing because it reveals a difference in how machines vs humans process. The implication is that there is actually speech signal that is inside the noise, as if voice has bounced off and interacted with the noise source (wind, fan, etc), and altered those sounds, left its impression, and that this information is then able to be utilized and contributes to the inference. Incredible!

With whisper: I started with the standard python models. They're kind of slow. I tried compiling python into a single binary using various tools. That didn't work. Then I found whisper.cpp--fantastic! A port of whisper to C++ that is so; much; faster. Mind blowing speed! Plus easily compilation. My use case was including transcription in a private, offline "transcribe anything" MacOS app. Whisper.cpp was the way to go.

Then I encountered another problem. What the "Whisperists" (experts in this nascent field, I guess) call it "hallucination". The model will "hallucinate". I found this hilarious! Another cross-over of human-machine conceptual models, our forever anthropomorphizing everything effortlessly. :)

Basically hallucination includes: feed Whisper a long period of silence, and the model is so desperate to find speech, it will infer (overfit? hallucinate?) speech out of the random background signal of silence / analog silence / background noise. Normally this presents as a loop of repeats of previous accurate transcribed phrase. Or, with smaller models, some "end-of-youtube video" common phrases like "Thank You!" or even "Thanks for Watching". I even got (from one particularly heavily distorted section, completely inaccurate) "Don't forget to like and subscribe!" Haha. But the larger models produce less hallucinations, and less generic "oh-so-that's-what-your-dataset-was!" hallucinations. But they do still hallucinate. Especially during silent sections.

At first, I tried using ffmpeg to chop the audio into small segments, ideally partitioned on silences. Unfortunately ffmpeg can only chop it into regular size segments, but it can output silence intervals, and you can chop around that (but not "online" / real time) as I was trying to achieve. Removing the silent segments (even the imperfect metric of "some %" of average output signal magnitude (sorry for my terminology, I'm not expert in DSP/audio)) drastically improved Whisper performance. Suddenly it went from hallucinating during silent segments, to perfect transcripts.

The other problem with silent segments is the model gets stuck. It gets "locked up" (spinning beach ball, blue screen of death style--I don't think it actually dies, but it spends a long, disproportionately long, time on segments with no speech. Like I said before, it's so cute that it's so desperate to find speech everywhere, it tries really hard, and works its little legs of during silence, but to no avail.

Anyway, moving on to the next problem: the imperfect metric of silence. This caused many issues. We were chopping out quieter speech. We were including loud background noise. Both these things caused issues: the first obvious, the second, the same as we faced before: Whisper (or Whisper.cpp) would hallucinate text into these noise segments.

At last, I discovered something truly great! VAD. Voice Activity Detection is another (normally) AI technique that allows segmenting audio around voice segments. I tried a couple Python implementations in standard speech toolkits, but none were that good. Then I found Silero VAD: an MIT licensed (for some model versions), AI VAD model. Wonderful!

Next problem was it was also in Python. And I needed it to be in C++. Luckily there was a C++ example, using ONNX runtime. (I had no idea any of these projects or tools existed mere weeks ago, and suddenly I'm knee deep!). There were a few errors, but I got rid of the bugs, and had a little command line tool from a minimal C++ build of ONNXruntime / Protobuf-Lite and the model. Last step was the ONNX model needed to be converted to ORT format. Luckily there's a handy Python script to do this inside the Python release of ONNXruntime. And, now, the VAD was super fast.

So i put all these pieces together: ffmpeg, VAD, whisper.cpp and made a MacOS app (with the correct signing and entitlements of course!) to transcribe English text from any input format: audio or video. Pretty cool, right?

Anyway, running Whisper on your own locally is not so easy! Much easier to sign up to the OpenAI API.

MacOS APP using Whisper (C++) and VAD0--conveniently called: WisprNote heh :) https://apps.apple.com/app/wisprnote/id1671480366

https://news.ycombinator.com/item?id=34992012

@ivan
Copy link
Author

ivan commented Aug 5, 2024

Apple Intelligence in 15.1 just flagged a phishing email as “Priority” and moved it to the top of my Inbox.

https://social.panic.com/@cabel/112905175504595751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment