lrq3000/blaze-webinar-continuum-analytics-2014-10-08-QA.txt

## blaze-webinar-continuum-analytics-2014-10-08-QA.txt
Q&A Session for  Getting Started with Blaze

Session number:  665410874
Date:  wednesday 8 october 2014
Starting time:  18:48

________________________________________________________________

Flemming Stark - 19:12
Q: earlier this year blz grew apart from blaze and i don't see it here or anywhere. is blz dead?
	Phillip Cloud - 19:15
	A: i believe blz is replaced by bcolz https://github.com/ContinuumIO/blz/issues/16
	Travis Oliphant - 19:45
	A: Hi Flemming.  Blz was primarily Francesc's work.  He has moved forward with bcolz and we are now supporting bcolz and blz is deprecated.
________________________________________________________________

Michael Sterling - 19:13
Q: Sorry joined in late. Is there somewhere to get the notebook?
	Matt T. - 19:16
	A: The powerpoint is available, but I'll have to let Matt post the link again. It was on the first slide. I'm not sure if the notebook is availble.
________________________________________________________________

Philip Branning - 19:21
Q: Q: Does Blaze have a caching layer?
	Phillip Cloud - 19:22
	A: No there's no caching, but specific backends might have this.
________________________________________________________________

Guillaume Gay - 19:22
Q: Hi, looks like my OS won't let me use sound with the cisco webex java app, so I don't hear anything you guys are saying. Thanks for the event anyway, I'll resort to the doc and so on to learn about Blaze!
	Matt T. - 19:23
	A: We're sorry to hear that, we will have the recording up on youtube, with a link from our site, later today
________________________________________________________________

Michael Sterling - 19:23
Q: Is the plan to keep the syntax of blaze aligned with pandas or will they just diverge over time?
	Phillip Cloud - 19:26
	A: This isn't a specific goal, but pandas' APIs are excellent so we'll probably keep many things. So, mostly aligned, but possibly a few differences
	Phillip Cloud - 19:27
	A: Here's a link to dplyr: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
________________________________________________________________

Vishal Soni - 19:27
Q: does blaze have delayed / 'lazy' computation? For example, stringing together queries before sending them to the backend engine?
	Phillip Cloud - 19:29
	A: Yes, a blaze expression does no backend computation. It simply sits around in memory waiting to be translated to a backend.
________________________________________________________________

Alberto Andrade-Fraga - 19:27
Q: Does blaze assume the DB schema has been created or can it also migrate a DB from one implementation to another?
	Phillip Cloud - 19:30
	A: Do you mean say from postgres to sql?
	Phillip Cloud - 19:30
	A: sorry postgres to mysql
________________________________________________________________

Alberto Andrade-Fraga - 19:31
Q: yes, lets assume you have a DB in MSSQL and want to try creating a copy of the DB in MongoDB?
	Phillip Cloud - 19:32
	A: Yes, I believe we have support for this and if we don't this is definitely a goal. This would be done with the into function.
	Phillip Cloud - 19:39
	A: I should note that I don't think we migrate things like indexes.
________________________________________________________________

Vishal Soni - 19:32
Q: is there a blaze api & backend for dense array data with metadata? Like larry or x-ray (i.e. ndarrays with axis labels)?
	Phillip Cloud - 19:34
	A: As of now, no, though this is a medium term goal. We have a PR up for SciDB and we talk to Stephan Hoyer (of xray) fairly regularly. We've mostly been focused on Table objects but Arrays are on the roadmap
________________________________________________________________

Drew Newman - 19:33
Q: Are there plans to support Cassandra as a back end?
	Phillip Cloud - 19:35
	A: I don't think we will implement this ourselves, but if a person wanted to write a backend for cassandra, then we would happily review and most likely accept a pr.
	Phillip Cloud - 19:36
	A: i believe cassandra uses a variant of sql, so if you had a separate python package that implements a sqlalchemy dialect, we could very easily help you plug this in to blaze
	Phillip Cloud - 19:39
	A: There's a package called impyla that implements a sqlalchemy dialect for the cloudera impala package.
________________________________________________________________

Philip Branning - 19:36
Q: What does blaze do when you try to pull down something that's too big to fit in memory?
	Phillip Cloud - 19:37
	A: Whatever Python itself does. There's no inspection of memory capacity or leveraging of that information as of now
________________________________________________________________

Philip Branning - 19:44
Q: Yeah, is there some idea of a streaming iterator type of thing for into?
	Phillip Cloud - 19:44
	A: Yes there's a "chunked" interface, that does exactly this
________________________________________________________________

Flemming Stark - 19:27
Q: some features of blaze now are similar to features of IoPro is that the approach or are there differences?
	Travis Oliphant - 19:47
	A: IOPro is an optimized interface for bringing data into memory.  Blaze could *use* IOPro to quickly load data into NumPy arrays or Pandas DataFrames.
________________________________________________________________

Kerry Oliphant - 19:41
Q: Do you have some time today to talk?  his presenation has spured some thinking
	Travis Oliphant - 19:48
	A: Yes, give me a call
________________________________________________________________

Oleg Mürk - 19:49
Q: Can Blaze do predicate push-down and map pruning (chunk value range filtering) with SparkSQL?
	Phillip Cloud - 19:50
	A: Is this something like filter(f, map(g, sequence))?
________________________________________________________________

rich fernandez - 19:49
Q: Is blaze trying to be a superset of sqlalchemy, to some extent?
	Travis Oliphant - 19:50
	A: Blaze is similar in spirit to sqlalchemy, but calling it TableAlchemy and ArrayAlchemy would better capture the spirit.
	Travis Oliphant - 19:50
	A: Also, Blaze *uses* sqlalchemy under the hood for its SQL interface.
________________________________________________________________

Stephen Larroque - 19:52
Q: How blaze is (will) managing complex computations on big data like matrix multiplication that aren't necessarily implemented in the database? Can it transparently streamline to numpy in an out-of-core fashion?
	Phillip Cloud - 19:55
	A: We don't have a way to choose a particular backend for streaming operations, but there's an open discussion about this https://github.com/ContinuumIO/blaze/issues/698
________________________________________________________________

Oleg Mürk - 19:54
Q: Does Blaze/Blosc do something similar to PyTable's OPSI indexes?
	Phillip Cloud - 19:57
	A: blaze indexes are dependent on whether the backend supports it, so eg you can do create_index(tables.Table) to create a fully sorted index
________________________________________________________________

Vishal Soni - 19:59
Q: How does the developer of a data library create a blaze backed? Do they commit it to the blaze codebase, or can they just package in in a standalone manner with their library?
	Phillip Cloud - 20:00
	A: Generally a more obscure backend would probably better off as a separate package, but of course obscure is open to discussion
________________________________________________________________

Vishal Soni - 20:00
Q: In the future, how do you envision maintanence of various backends? Centralized at continuum vs. distributed across libraries?
	Phillip Cloud - 20:01
	A: Good question, I don't think we have a solid plan for how to deal with this yet, though we've discussed a little bit
________________________________________________________________

Philip Branning - 20:00
Q: I understand query optimization would generally be a backend-specific concern.  But some query optimization should make sense at the level of Blaze.  Is there any query optimization currently?
	Phillip Cloud - 20:04
	A: This hasn't received a ton of attention, in favor of getting a core api and infrastructure in place, but there are some old issues discussing things like constant folding and other optimizations. we are thinking about these but they are low priority
________________________________________________________________

Drew Newman - 20:04
Q: Will this chat transcript be available after the presentation ends today? There is a lot of good information here.
	Phillip Cloud - 20:06
	A: If you're on the Event Center application you can save the chat
	Phillip Cloud - 20:06
	A: There's a recording that I think will have this as well, maybe Lila or MattT can comment
________________________________________________________________

Oleg Mürk - 20:07
Q: Do You currently integrate with SciDB?
	Phillip Cloud - 20:07
	A: Nope, though Chris Beaumont has a PR up https://github.com/ContinuumIO/blaze/pull/681
________________________________________________________________

Oleg Mürk - 20:11
Q: In PySpark integration what is exactly Blaze/Blosc function?
	Phillip Cloud - 20:13
	A: Blaze drives PySpark (blaze provides an API to call the API of PySpark) and Blosc is an algorithm orthogonal to the goals of blaze.
	Q&A Session for Getting Started with Blaze

	Session number: 665410874
	Date: wednesday 8 october 2014
	Starting time: 18:48

	________________________________________________________________

	Flemming Stark - 19:12
	Q: earlier this year blz grew apart from blaze and i don't see it here or anywhere. is blz dead?
	Phillip Cloud - 19:15
	A: i believe blz is replaced by bcolz https://github.com/ContinuumIO/blz/issues/16
	Travis Oliphant - 19:45
	A: Hi Flemming. Blz was primarily Francesc's work. He has moved forward with bcolz and we are now supporting bcolz and blz is deprecated.
	________________________________________________________________

	Michael Sterling - 19:13
	Q: Sorry joined in late. Is there somewhere to get the notebook?
	Matt T. - 19:16
	A: The powerpoint is available, but I'll have to let Matt post the link again. It was on the first slide. I'm not sure if the notebook is availble.
	________________________________________________________________

	Philip Branning - 19:21
	Q: Q: Does Blaze have a caching layer?
	Phillip Cloud - 19:22
	A: No there's no caching, but specific backends might have this.
	________________________________________________________________

	Guillaume Gay - 19:22
	Q: Hi, looks like my OS won't let me use sound with the cisco webex java app, so I don't hear anything you guys are saying. Thanks for the event anyway, I'll resort to the doc and so on to learn about Blaze!
	Matt T. - 19:23
	A: We're sorry to hear that, we will have the recording up on youtube, with a link from our site, later today
	________________________________________________________________

	Michael Sterling - 19:23
	Q: Is the plan to keep the syntax of blaze aligned with pandas or will they just diverge over time?
	Phillip Cloud - 19:26
	A: This isn't a specific goal, but pandas' APIs are excellent so we'll probably keep many things. So, mostly aligned, but possibly a few differences
	Phillip Cloud - 19:27
	A: Here's a link to dplyr: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
	________________________________________________________________

	Vishal Soni - 19:27
	Q: does blaze have delayed / 'lazy' computation? For example, stringing together queries before sending them to the backend engine?
	Phillip Cloud - 19:29
	A: Yes, a blaze expression does no backend computation. It simply sits around in memory waiting to be translated to a backend.
	________________________________________________________________

	Alberto Andrade-Fraga - 19:27
	Q: Does blaze assume the DB schema has been created or can it also migrate a DB from one implementation to another?
	Phillip Cloud - 19:30
	A: Do you mean say from postgres to sql?
	Phillip Cloud - 19:30
	A: sorry postgres to mysql
	________________________________________________________________

	Alberto Andrade-Fraga - 19:31
	Q: yes, lets assume you have a DB in MSSQL and want to try creating a copy of the DB in MongoDB?
	Phillip Cloud - 19:32
	A: Yes, I believe we have support for this and if we don't this is definitely a goal. This would be done with the into function.
	Phillip Cloud - 19:39
	A: I should note that I don't think we migrate things like indexes.
	________________________________________________________________

	Vishal Soni - 19:32
	Q: is there a blaze api & backend for dense array data with metadata? Like larry or x-ray (i.e. ndarrays with axis labels)?
	Phillip Cloud - 19:34
	A: As of now, no, though this is a medium term goal. We have a PR up for SciDB and we talk to Stephan Hoyer (of xray) fairly regularly. We've mostly been focused on Table objects but Arrays are on the roadmap
	________________________________________________________________

	Drew Newman - 19:33
	Q: Are there plans to support Cassandra as a back end?
	Phillip Cloud - 19:35
	A: I don't think we will implement this ourselves, but if a person wanted to write a backend for cassandra, then we would happily review and most likely accept a pr.
	Phillip Cloud - 19:36
	A: i believe cassandra uses a variant of sql, so if you had a separate python package that implements a sqlalchemy dialect, we could very easily help you plug this in to blaze
	Phillip Cloud - 19:39
	A: There's a package called impyla that implements a sqlalchemy dialect for the cloudera impala package.
	________________________________________________________________

	Philip Branning - 19:36
	Q: What does blaze do when you try to pull down something that's too big to fit in memory?
	Phillip Cloud - 19:37
	A: Whatever Python itself does. There's no inspection of memory capacity or leveraging of that information as of now
	________________________________________________________________

	Philip Branning - 19:44
	Q: Yeah, is there some idea of a streaming iterator type of thing for into?
	Phillip Cloud - 19:44
	A: Yes there's a "chunked" interface, that does exactly this
	________________________________________________________________

	Flemming Stark - 19:27
	Q: some features of blaze now are similar to features of IoPro is that the approach or are there differences?
	Travis Oliphant - 19:47
	A: IOPro is an optimized interface for bringing data into memory. Blaze could use IOPro to quickly load data into NumPy arrays or Pandas DataFrames.
	________________________________________________________________

	Kerry Oliphant - 19:41
	Q: Do you have some time today to talk? his presenation has spured some thinking
	Travis Oliphant - 19:48
	A: Yes, give me a call
	________________________________________________________________

	Oleg Mürk - 19:49
	Q: Can Blaze do predicate push-down and map pruning (chunk value range filtering) with SparkSQL?
	Phillip Cloud - 19:50
	A: Is this something like filter(f, map(g, sequence))?
	________________________________________________________________

	rich fernandez - 19:49
	Q: Is blaze trying to be a superset of sqlalchemy, to some extent?
	Travis Oliphant - 19:50
	A: Blaze is similar in spirit to sqlalchemy, but calling it TableAlchemy and ArrayAlchemy would better capture the spirit.
	Travis Oliphant - 19:50
	A: Also, Blaze uses sqlalchemy under the hood for its SQL interface.
	________________________________________________________________

	Stephen Larroque - 19:52
	Q: How blaze is (will) managing complex computations on big data like matrix multiplication that aren't necessarily implemented in the database? Can it transparently streamline to numpy in an out-of-core fashion?
	Phillip Cloud - 19:55
	A: We don't have a way to choose a particular backend for streaming operations, but there's an open discussion about this https://github.com/ContinuumIO/blaze/issues/698
	________________________________________________________________

	Oleg Mürk - 19:54
	Q: Does Blaze/Blosc do something similar to PyTable's OPSI indexes?
	Phillip Cloud - 19:57
	A: blaze indexes are dependent on whether the backend supports it, so eg you can do create_index(tables.Table) to create a fully sorted index
	________________________________________________________________

	Vishal Soni - 19:59
	Q: How does the developer of a data library create a blaze backed? Do they commit it to the blaze codebase, or can they just package in in a standalone manner with their library?
	Phillip Cloud - 20:00
	A: Generally a more obscure backend would probably better off as a separate package, but of course obscure is open to discussion
	________________________________________________________________

	Vishal Soni - 20:00
	Q: In the future, how do you envision maintanence of various backends? Centralized at continuum vs. distributed across libraries?
	Phillip Cloud - 20:01
	A: Good question, I don't think we have a solid plan for how to deal with this yet, though we've discussed a little bit
	________________________________________________________________

	Philip Branning - 20:00
	Q: I understand query optimization would generally be a backend-specific concern. But some query optimization should make sense at the level of Blaze. Is there any query optimization currently?
	Phillip Cloud - 20:04
	A: This hasn't received a ton of attention, in favor of getting a core api and infrastructure in place, but there are some old issues discussing things like constant folding and other optimizations. we are thinking about these but they are low priority
	________________________________________________________________

	Drew Newman - 20:04
	Q: Will this chat transcript be available after the presentation ends today? There is a lot of good information here.
	Phillip Cloud - 20:06
	A: If you're on the Event Center application you can save the chat
	Phillip Cloud - 20:06
	A: There's a recording that I think will have this as well, maybe Lila or MattT can comment
	________________________________________________________________

	Oleg Mürk - 20:07
	Q: Do You currently integrate with SciDB?
	Phillip Cloud - 20:07
	A: Nope, though Chris Beaumont has a PR up https://github.com/ContinuumIO/blaze/pull/681
	________________________________________________________________

	Oleg Mürk - 20:11
	Q: In PySpark integration what is exactly Blaze/Blosc function?
	Phillip Cloud - 20:13
	A: Blaze drives PySpark (blaze provides an API to call the API of PySpark) and Blosc is an algorithm orthogonal to the goals of blaze.