Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save lrq3000/e65e5af425bac977af2c to your computer and use it in GitHub Desktop.
Save lrq3000/e65e5af425bac977af2c to your computer and use it in GitHub Desktop.
blaze-webinar-continuum-analytics-2014-10-08-QA
Q&A Session for Getting Started with Blaze
Session number: 665410874
Date: wednesday 8 october 2014
Starting time: 18:48
________________________________________________________________
Flemming Stark - 19:12
Q: earlier this year blz grew apart from blaze and i don't see it here or anywhere. is blz dead?
Phillip Cloud - 19:15
A: i believe blz is replaced by bcolz https://github.com/ContinuumIO/blz/issues/16
Travis Oliphant - 19:45
A: Hi Flemming. Blz was primarily Francesc's work. He has moved forward with bcolz and we are now supporting bcolz and blz is deprecated.
________________________________________________________________
Michael Sterling - 19:13
Q: Sorry joined in late. Is there somewhere to get the notebook?
Matt T. - 19:16
A: The powerpoint is available, but I'll have to let Matt post the link again. It was on the first slide. I'm not sure if the notebook is availble. 
________________________________________________________________
Philip Branning - 19:21
Q: Q: Does Blaze have a caching layer?
Phillip Cloud - 19:22
A: No there's no caching, but specific backends might have this. 
________________________________________________________________
Guillaume Gay - 19:22
Q: Hi, looks like my OS won't let me use sound with the cisco webex java app, so I don't hear anything you guys are saying. Thanks for the event anyway, I'll resort to the doc and so on to learn about Blaze!
Matt T. - 19:23
A: We're sorry to hear that, we will have the recording up on youtube, with a link from our site, later today
________________________________________________________________
Michael Sterling - 19:23
Q: Is the plan to keep the syntax of blaze aligned with pandas or will they just diverge over time?
Phillip Cloud - 19:26
A: This isn't a specific goal, but pandas' APIs are excellent so we'll probably keep many things. So, mostly aligned, but possibly a few differences
Phillip Cloud - 19:27
A: Here's a link to dplyr: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
________________________________________________________________
Vishal Soni - 19:27
Q: does blaze have delayed / 'lazy' computation? For example, stringing together queries before sending them to the backend engine? 
Phillip Cloud - 19:29
A: Yes, a blaze expression does no backend computation. It simply sits around in memory waiting to be translated to a backend.
________________________________________________________________
Alberto Andrade-Fraga - 19:27
Q: Does blaze assume the DB schema has been created or can it also migrate a DB from one implementation to another?
Phillip Cloud - 19:30
A: Do you mean say from postgres to sql?
Phillip Cloud - 19:30
A: sorry postgres to mysql
________________________________________________________________
Alberto Andrade-Fraga - 19:31
Q: yes, lets assume you have a DB in MSSQL and want to try creating a copy of the DB in MongoDB?
Phillip Cloud - 19:32
A: Yes, I believe we have support for this and if we don't this is definitely a goal. This would be done with the into function.
Phillip Cloud - 19:39
A: I should note that I don't think we migrate things like indexes.
________________________________________________________________
Vishal Soni - 19:32
Q: is there a blaze api & backend for dense array data with metadata? Like larry or x-ray (i.e. ndarrays with axis labels)? 
Phillip Cloud - 19:34
A: As of now, no, though this is a medium term goal. We have a PR up for SciDB and we talk to Stephan Hoyer (of xray) fairly regularly. We've mostly been focused on Table objects but Arrays are on the roadmap
________________________________________________________________
Drew Newman - 19:33
Q: Are there plans to support Cassandra as a back end?
Phillip Cloud - 19:35
A: I don't think we will implement this ourselves, but if a person wanted to write a backend for cassandra, then we would happily review and most likely accept a pr.
Phillip Cloud - 19:36
A: i believe cassandra uses a variant of sql, so if you had a separate python package that implements a sqlalchemy dialect, we could very easily help you plug this in to blaze
Phillip Cloud - 19:39
A: There's a package called impyla that implements a sqlalchemy dialect for the cloudera impala package.
________________________________________________________________
Philip Branning - 19:36
Q: What does blaze do when you try to pull down something that's too big to fit in memory?
Phillip Cloud - 19:37
A: Whatever Python itself does. There's no inspection of memory capacity or leveraging of that information as of now
________________________________________________________________
Philip Branning - 19:44
Q: Yeah, is there some idea of a streaming iterator type of thing for into?
Phillip Cloud - 19:44
A: Yes there's a "chunked" interface, that does exactly this
________________________________________________________________
Flemming Stark - 19:27
Q: some features of blaze now are similar to features of IoPro is that the approach or are there differences?
Travis Oliphant - 19:47
A: IOPro is an optimized interface for bringing data into memory. Blaze could *use* IOPro to quickly load data into NumPy arrays or Pandas DataFrames.
________________________________________________________________
Kerry Oliphant - 19:41
Q: Do you have some time today to talk? his presenation has spured some thinking
Travis Oliphant - 19:48
A: Yes, give me a call 
________________________________________________________________
Oleg Mürk - 19:49
Q: Can Blaze do predicate push-down and map pruning (chunk value range filtering) with SparkSQL?
Phillip Cloud - 19:50
A: Is this something like filter(f, map(g, sequence))?
________________________________________________________________
rich fernandez - 19:49
Q: Is blaze trying to be a superset of sqlalchemy, to some extent?
Travis Oliphant - 19:50
A: Blaze is similar in spirit to sqlalchemy, but calling it TableAlchemy and ArrayAlchemy would better capture the spirit. 
Travis Oliphant - 19:50
A: Also, Blaze *uses* sqlalchemy under the hood for its SQL interface.
________________________________________________________________
Stephen Larroque - 19:52
Q: How blaze is (will) managing complex computations on big data like matrix multiplication that aren't necessarily implemented in the database? Can it transparently streamline to numpy in an out-of-core fashion?
Phillip Cloud - 19:55
A: We don't have a way to choose a particular backend for streaming operations, but there's an open discussion about this https://github.com/ContinuumIO/blaze/issues/698
________________________________________________________________
Oleg Mürk - 19:54
Q: Does Blaze/Blosc do something similar to PyTable's OPSI indexes?
Phillip Cloud - 19:57
A: blaze indexes are dependent on whether the backend supports it, so eg you can do create_index(tables.Table) to create a fully sorted index
________________________________________________________________
Vishal Soni - 19:59
Q: How does the developer of a data library create a blaze backed? Do they commit it to the blaze codebase, or can they just package in in a standalone manner with their library? 
Phillip Cloud - 20:00
A: Generally a more obscure backend would probably better off as a separate package, but of course obscure is open to discussion
________________________________________________________________
Vishal Soni - 20:00
Q: In the future, how do you envision maintanence of various backends? Centralized at continuum vs. distributed across libraries? 
Phillip Cloud - 20:01
A: Good question, I don't think we have a solid plan for how to deal with this yet, though we've discussed a little bit
________________________________________________________________
Philip Branning - 20:00
Q: I understand query optimization would generally be a backend-specific concern. But some query optimization should make sense at the level of Blaze. Is there any query optimization currently?
Phillip Cloud - 20:04
A: This hasn't received a ton of attention, in favor of getting a core api and infrastructure in place, but there are some old issues discussing things like constant folding and other optimizations. we are thinking about these but they are low priority
________________________________________________________________
Drew Newman - 20:04
Q: Will this chat transcript be available after the presentation ends today? There is a lot of good information here.
Phillip Cloud - 20:06
A: If you're on the Event Center application you can save the chat
Phillip Cloud - 20:06
A: There's a recording that I think will have this as well, maybe Lila or MattT can comment
________________________________________________________________
Oleg Mürk - 20:07
Q: Do You currently integrate with SciDB?
Phillip Cloud - 20:07
A: Nope, though Chris Beaumont has a PR up https://github.com/ContinuumIO/blaze/pull/681
________________________________________________________________
Oleg Mürk - 20:11
Q: In PySpark integration what is exactly Blaze/Blosc function?
Phillip Cloud - 20:13
A: Blaze drives PySpark (blaze provides an API to call the API of PySpark) and Blosc is an algorithm orthogonal to the goals of blaze. 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment