Skip to content

Instantly share code, notes, and snippets.

@omar-3
Last active April 3, 2020 12:16
Show Gist options
  • Save omar-3/0709a239c6454c833356280a327b803f to your computer and use it in GitHub Desktop.
Save omar-3/0709a239c6454c833356280a327b803f to your computer and use it in GitHub Desktop.
My GSoC 2020 proposal for Chapel Programming Language

Functional Programming Module

This proposal is an extension to itertools project, in which I propose an implementation for a complete-ish functional programming module, with consideration of the current state of Chapel compiler.

Table of contents

  1. Introduction
  2. Contact
  3. Coding Experience
  4. Survey
  5. Prerequisites
  6. Self-assesment
  7. The Task
  8. Expected Outcomes
    1. First Milestone
    2. Second Milestone
    3. Third Milestone

Introduction

In a few sentences, describe your overall background and current studies.

I'm a junior student in Zewail University of Science and Technology majoring in information engineering with a focus on computer architecture and artificial intelligence; kind of a double track in my university. I have taken fairly advanced courses on a lot of computer science fields: information theory, optimization, advanced algorithms, etc. I would like to continue my graduate study in the field of compilers design, despite not having compilers courses in my curriculum, but the field has always amazed me; it touches almost every thing in the "computer stack", from memory management and computer architecture to ingenious string and parsing algorithms. I was a biological sciences major for one year, but I caught the bug of computer science in a discrete mathematics class, that is why I'm a tad old.

Briefly describe any relevant coursework for the project you are proposing.

I'm proposing a functional programming module like Python module consisting of itertools, functools, and operators sub-modules. This project doesn't require specific knowledge about a certain field; like Parser project, but requires good problem solving and algorithms skills to deal with parallel iterators hurdles, and functools utilities implementation. Almost all my courses need serious programming skills and very fast adaptation to new languages and tools.

Describe why you wish to participate in the Google Summer of Code.

Contributing to a big open source project is very beneficial, not because of your contribution itself; but reading other people's code, and see how they approach the problems is the most important part for a budding developer like myself. Having to spend 3-4 months with only one organization would make you iterate over and over on the organization's code and dig deeper and absorb the ideas behind certain decisions in design for example as much as possible. Also the mentorship from someone who is ahead of me by years of experience guiding me and telling me what and where to look for things that I didn't even know existed.

Tell us why you wish to work with the Chapel project in particular.

Chapel is a new language, and it isn't everyday you see serious programming languages being built and mature and the people running the development are obviously very knowledgeable. This phase of development in language, where people are writing the standard libraries and other utilities, is very aligned with my studying objectives right now; there is a lot to be learnt from the state of the Chapel community. I have just started coding from 2-3 years ago, and It wouldn't be very beneficial for me to jump onto the really advanced fields without knowing how things REALLY work behind the scenes. The few modules I have read so far from the source code are making sense to me and giving me new insights and things to be concerned about, unlike reading the source code of C++ standard libraries of C++ for instance, in which a counterintuitive approach to the problems is adapted.

What do you hope to learn over the summer?

I'm replicating the work of about 5-6 functional programming libraries written in multiple languages: python's functional, more-itertools toolz, C++ boost functional libraries, and Javascript's underscore.js. It would be a really good exercise to see how these are written and designed. I wanted to get into the functional programming paradigm anyway, and I intend to be studying haskell/scala in concurrent with my working on GSoC in the summer. I want to learn how to structure and write production-ready code and what to look for and consider the fact that my project upon completion would be a fairly big module with a lot of dependencies and utilities. The state of chapel compiler imposes certain limitations in programming, either because its design or some stuff that hasn't been implemented yet in the compiler, which forces me to search for the cause of the problem and hopefully know new compilers stuff like chapel-lang/chapel#15234.

One of the most beneficial side effects is providing feedback for the language design, stirring these conversations would be very beneficial for me to learn and enhance my communications skills.

How well can you comprehend and understand English? How strong is your written English?

My comprehension and understanding of English is perfect-ish. My writting is not top-notch, but good enough.

Do you have any other commitments for the summer period? Do you have planned vacations?

I don't have any commitments nor vacations planned.

Contact

What email address and GitHub username will you be using to communicate with us?

Email: s-omarkhaled@zewailcity.edu.eg Github handler: omar-3

What time zone do you live in? Will that change during the summer? Describe in UTC+x or -x.

UTC+2

What times will you be comfortable working? How much earlier could you start your day and how much later could you stay if it helped you to communicate with other developers

In this summer, I don't have any commitment, so I can start as early or as late for the convenience of my mentor.

Coding Experience

Describe your experience with Chapel, C, C++, and any other programming experience you wish to mention.

  1. Chapel: I have created a mason package in which I wrote a very quick and dirty implementation for the proposed functional programming module with tests and some examples, check it from here. Each method has its own gist and runnable script, the links for these gists are sparsed through the proposal and the task section. The only advantage of this package mason over these scripts, that it is using the Unit Testing module, and proper documentation.

  2. C/C++: I have implemented terminal-based text editor in C without the usage of ncurses , instead I used termios library and ANSI/VT100 Terminal Control Escape Sequences, it was the first time to implement something not so trivial with C and seriously using memory management functions. For C++ I have never done anything serious with it except maybe 2-3 projects in university, but I'm aware of the enormous gap of my knowledge without studying C++14/17/20.

  3. Python: We are using it in university as much as possible. I had done my information theory course in it implementing JPEG standard and subset of H.246 standard which includes arithmetic encoding algorithm, I did my linear/non-linear programming class with it implementing simplex/interior-point algorithms, I had done my artificial intelligence class in it and I have contributed to the official textbook repository, here is the PR aima-code/aima-python#1148, it isn't much, but that is my first and only contribution to open source project, and I just wanted to do anything even remotely related to Peter Norvig.

  4. Other than all that, I really like bash and I started the typical algorithms and competitive programming repository everyone has done, to have the common algorithms implemented in bash and many other languages, but I can write serious bash scripts and I'm able to understand and comprehend high-profile bash projects.

  5. I like reading a lot, and being lucky enough to be in a field like ours and get exposed to the massive number of books in the field, which is written really well, not talking about the technical side but the writing itself, is one of the most enjoyable things for me. I like to implement what I read in these books in different language than what the author is using to gain deeper understanding of the topic and not just copying the code and running it like this toy compiler, that also made me realtively speaking much flexible in working with new technologies.

Describe any experience with compiler development, parallel computing, or any other knowledge you know will be useful for the task.

I don't have experience writing libraries in any language before. But as for the parallel computing part, I have a good grasp of the concepts of threading, messaging techniques, and synchronization issues, because of my OS courses. I have also played a bit with CUDA and OpenMP but nothing I can call a project.

Are you familiar with these tools? (Familiarity with these tools is not necessarily required for all projects but we will need to know how to guide you)

git

I'm very comfortable with it, and I know where to look if things go south.

make - gdb and valgrind - gcc

I'm comfortable enough to write make scripts for fairly big projects. I can use gdb, but not so comfortably. I know that valgrind exists for handling memory stuff and caches. As for gcc , surely I know how to compile C files and link libraries and I do understand when to apply optimizations flags, other than that, not so much.

What experience do you have being part of a development team?

I have interned once in a research lab in my university to develop some sort of online port to use FPGAs and other equipment remotely for educational purposes. It was a team of 3 students and research assistant, the task was to have a web page and an embedded terminal controlling the linux server that the FPGAs connected to a server.

What is the biggest project you have worked on as a software developer? What did you learn in that project? What was your role in that project over time?

The aforementioned internship is the biggest project I have worked on. I learnt some exotic hardware stuff, and linux internals and how it manages processes and memory. The nature of this kind of internships is that you don't have a specific role; everyone is trying to learn as much as possible, but my role was doing the web development part and some little bit of linux system administration. When I applied to this internship, I wanted to do something I could do, web development, in an uncomfortable environment, hardware lab for me to learn more.

Is any of the code you have written already open source? Can you point us to some code you have written?

It is not a unique project but a solution for an algorithm textbook I adore, and as far as I searched, there are absolutely no solutions or implementations for this textbook. Here is the repo, this is a big book and I started it from month or so, but there is a considerable amount already done.

What have you already contributed to the Chapel project? Please list pull request and issue numbers.

I have three open issues: #15234 and #15375 and #15230, this last one is not mine, but based on my gist. As I mentioned above, contributing feedback to the language design and issues would be a major-part of my project, because 90% of the code that would be written is based on existing implementations and algorithms, the issue would be how to cast this 90% to Chapel.

Survey

Had you heard about Chapel before the Summer of Code? If so, where? If not, where would you advise us to advertise?

I heard the name once being said by students taking a parallel computing course in my university. Universities, and with its cool name, and objectives, and how this is the python for parallel computing, it would catch on.

What was the first question concerning Chapel that you could not find an answer to quickly?

Why does compilation for trivial scripts take a noticeably long time?

What will keep you actively engaged with the Chapel community after this summer is over?

I'm in a phase in my software career, where I need to examine how basic things are built, and what algorithms are going under the hood of other people's work, and not using libraries as black boxes. Chapel would be perfect for that, because the standard libraries are still maturing and there is a fairly large number of modules that must be implemented. That would be an amazing opportunity to put my study of what is under hood from other languages to Chapel.

Are you applying to any other organizations for this year’s Google Summer of Code? If so, what is the order of your preference, in case you are accepted to multiple organizations?

No, I really wanted to get this thing as ready as I could, because the project's objectives and outcomes are perfectly aligned with what I want to do right now

Prerequisites

What operating system(s) do you work with?

Kubuntu, in kubuntu you have a very friendly UI like windows, and your bash shell like linux, and OS X.

  • Are you able to install software on the computer you plan to use?
  • Will you have access to a computer with an internet connection for your development?

yes.

Self-assesment

What does useful criticism look like from your point of view as a committing student?

Telling me how my code could get broken and where should I consider so that wouldn't happen, not spoon feeding me the exact solution. I like honest opinions, If I did something really wrong I would like to know that.

What techniques do you use to give constructive advice? How do you best like to receive constructive feedback?

Constructive advice should really help the other person to constructs the solution, not giving it and telling him/her this and that should be done like that. I would like coding feedback to be crammed with a lot of information and things the mentor is excited to share with me.

What is your development style? Do you prefer to figure out/discuss changes before you start coding? Or do you prefer to code a proof-of-concept to see how it turns out?

It differs depending on the task in my hands; if it needs design of an algorithm, I would make multiple draft on paper before even writing a single line of code; otherwise, experimentation in some other cases is a must, like in implementing obvious things but you need to squeeze a bit of efficiency of a piece of code.

The Task

Describe the task you intend to work on. If it’s one of the tasks from our ideas list, let us know which elements of it you want most to focus on, if you know. If you are proposing a new task, describe the task and who you have already discussed it with.

The goal of that entire module is providing utilities to compose solutions in an elegant and verbose way, this module is truly a gem in any developer repertoire; elegant structures for iterations, you have itertools; efficient programming and not, you have functools; verbose way of using normal operators, you have operator module.

I intend to have a complete implementation for a not-so-much replica of functional programming module in Python, that includes: itertools module, a very small and most useful subset of functools module, and operators module. There are other methods I have in mind that would be a very useful from more-itertools, underscore.js, and some very interesting object wrappers from boost. I talked about the idea with Mr. Ben Albrecht at the very first days of GSoC at this gist here.


Chapel compiler state

Chapel compiler as for right now has a bit of limitations because of not implemented features and design decisions, and 50% of what I was doing in the proposal's time is to know what Chapel compiler can and cannot do in comparison with my main reference, Python. I tried to do one or two things from each category of utilities to see what can and cannot be done with the current compiler state.


Summary of all issues and proposed solutions

  • Packing and un-packing for function input: This feature is needed because in iter starmap, you take in the input variables as a one chunk of array element, or in lru_cache where you need to hash a certain input as one entity.

    • Luckily chapel has tuple expansion, which is perfect for iter starmap method because we have our inputs for a certain call as tuple. Here is a gist to show this in action with parallel and serial implementation for the function.
    • In lru_cache utility you need to treat the input of a function as a one blob of data to be hashed and can be referred to. Here is a gist for a complete-ish implementation for the utility where I need to pass to the utility inputs as a tuple, which looks a bit weird.
  • Functions in chapel are not really first-class citizens: This issue would hinders a bit of a very functional-y utilities like function currying and binders like this. There is nothing to get around that, check chapel-lang/chapel#15234.

  • Parallel iterators for combinatoric iterators: In normal, and typical parallized iter methods, I just use RangeChunk methods and partition the given domain and that's it, maybe with some alterations to the algorithm itself like the unnecessary iter count. Now what is the issue in generating permutations for example? We don't have a "range" to partition, and pass those partitions to the follower iterators.

    • Here is what I propose: We will generate permutations in a lexicographic order, so our "range" is between a sorted array in increasing order, and the reverse of that array. How can we partition that "range"; by having multiple a pseudo array of arrays sorted in lexicographic order, which has its first element and the last element, we need to get some elements of that hypothetical array in increasing order and have them in pairs and tell the follower iterators to generatre permutations in lexicographic order between those two pairs. How to get those pairs? The partitioning algorithm is very simple although it has obvious performance issues, but you would see that it could be improved easily.
    • Let's say we want to have permutation for the array [1,4,3,5,2]
    • We need to sort the array to get the first element in our lexicographic imaginary array so now we have the first element [1,2,3,4,5] and surely the last element of the imaginary array would be [5,4,3,2,1]
    • We can have intermediate arrays between those two limits by performing multiple "pushing" the last element to the left by one step,could be more if the array is bigger and number of tasks are more, hence getting array of higher lexicographical order, until we get the last array [5,4,3,2,1]
    • Let's say we have 4 tasks and we want to partition the lexicographical order of the aforementioned array into 4 ranges:
    • (1) [1,2,3,4,5](2) [1,2,3,5,4](3) [1,2,5,3,4](4)[1,5,2,3,4] ⇒ to our our reversed array (5)[5,4,3,2,1]
    • so first task would generate between (1) ⇒ (2) and second task (2) ⇒ (3) and third task (3) ⇒ (4) and fourth task from (4) ⇒ (5)
  • Obvious issues: The amount of work between tasks is not fair at all; the first task generates 2 permutations and gets killed, whereas the last task generates about 30-40 permutation out of the 120 permutations. Fortunately, this could be solved by shuffling the array and sorting the produced arrays in lexicographic order, we could do the "pushing" in more smartly by rotating suffixes of the array. Even this current implementation results in about 25%-30% more efficiency than the serial version. Here is an implementation for parallel and serial version of iter permute.

  • Functools in Chapel: The current implementation of lru_cache is not user friendly at all, because of the need of passing "an example" of expected output. I feel like the hashing step could be done in a better way? The implementation of the ordered map should be enhanced and maybe be an independent data structure?


Expected outcomes

Here is a complete list of I guess should be the outcome of the project including extra stuff in case I have enough time.


itertools

This is still for sure the main focus of the project. I'm now referencing the main page of the official documentation of python itertools, I don't see the need for iter count method Infinite Iterators because in Chapel we have ranges, which is much faster, as for cycle and repeat, they are already implemented from here chapel-lang/chapel#12657. All Iterators terminating on the shortest input sequence should be implemented, except for the ones already present in Chapel like reduce method. For sure combinatoric iterators should be implemented. I guess the already present FunctionalOperations should be merged with this project. As for more-itertools and toolz and others, there are some utilities in them I see very crucial and elegant, and should be a priority not extras: like tabulate, augmenting iterators, and sets operations like different and union of iterables from toolz.


functools

Truly one of Python gems for programmers and indispensable in some situations. The priority for me here is to have a complete and efficient implementation for lru_cache function, which requires the implementation of an ordered dictionary to keep track of the order of function calling and the corresponding output for a hashed chunk of inputs, here is again the gist for my minimal implementation for it. Another very interesting utility from Python is partial class, thanks again for tuple expansion It could easily be implemented in Chapel, here is a gist for example how we would invoke partial class and pass input to it, so you can create new derived functions that have some input parameters pre-assigned, here is a very good blog about partial object, but consider a situation where we have a pipeline of function who accepts different number of arguments, but we want to pass through that pipeline stream only one value. There are very interesting other ideas from underscore.js if we had extra time.


operator

  • This is if I finished the outcomes in a record time and can't find anything to work on, but my hope is to have at least the basic bare bone structure of the module within the period of GSoC to have a dedicated mentor with me, and continue working on it after the GSoC period.
  • The point of that module is providing the same operations like addition and multiplication in a very verbose way, so if you want the addition function explicitly, you can do so with that operator. And it should be a little faster than normal operations in Chapel so it should be implemented in C and exposed from Chapel. This is the first thing I'd like to do if we have extra time, before the extras of itertools and functools.

Why is this task exciting to you? Why did you choose this particular task? What do you hope to learn by working on it?

Like I said above, I'm in a phase in my education for being a developer, in which I need to really write structured, well designed code, and to begin to care about performance. I chose that project, because functional programming is really cool, it's not everyday that I can find people to mentor me to build a relatively big library, and the other project I was interested in was the parser project, and I didn't think I was ready yet to start studying compilers and parsing. The most important objective is to learn how big libraries are structured and designed, and maybe study some new technology and hear about these stuff that I have never known exist in the real world.


Provide a rough estimated timeline for your work on the task. This timeline should take into account any non-coding time, such as exams, GSoC midterms, and vacation. Describe milestones you expect to achieve as you work towards the task.

The program would start from May 4 to August 24, that is about 100 days or 15-16 weeks. I'm going to be free because universities would finish early, because the current state of the world, hence let's divide the work over 15 weeks, considering the 3 evaluation/milestones including final evaluation. Roughly speaking we are going to have 5 weeks between each evaluation, so here is goes my proposed timeline:

First Milestone

  • This milestone would be for about 6-7 weeks.
  • These weeks would be dedicated for the serial and parallel itertools module. Out of these 40-50 days, there would be about 10-15 days dedicated to proposing APIs design and taking feedback cycle.
  • The challenge in this milestone is about how to tune the partitioning algorithm for combinatoric iterators, and getting good performance compared to python because this module is implemented there in C, however the already implemented methods I had done has approximately equal or more performance compared to python . The parallel iterators in Chapel have noticeably more performance than python, as expected for sure.
  • Here is the a list of expected outcome for the first milestone, asteriked elements are meant for extra time. Each method would have a link
Method Source Parallel issues and other stuff
chain itertools don't need, you just have a single output
compress itertools here is a gist for serial and parallel version
dropwhile itertools here the leader need to do all the thinking and the followers are just yielding every thing the whole range, very surprisingly to the parallel version is about 60x-80x than the serial version on some trials, but on other about 0.5x. here is a gist for serial and parallel version.
filter itertools this one was a bit weird; the parallel version was constantly slower than the serial version by about 10%-15% constantly. here is a gist for it, I used about array with bout 1000 elements this time.
starmap itertools this is one was very crucial for my proposal and especially itertools design, because through it I learned how variables packing and un-packing works in Chapel. here is a gist for it, parallel iterators were more efficient as expected,
groupby itertools this is very interesting one; imagine if you have table of data rows and want to group the rows into sub-tables depending on a common trait, this is your guy for that. I have the serial version here in this gist, can be parallelized easily if we got traits in a set with the leader iterator and partitions that set between followers.
islice itertools easily implemented
tee itertools parallel version would be a bit tricky, but very similar to the unnecessary count method in here. because I need to have a reference point of partitioning the range for each follower.
permutations itertools this is one of the major parts of the project; having parallel iterator for this function needs a partitioning algorithm for the lexicographic range of all permutations, it needs a lot of tuning. here is a gist, implementing both serial and parallel iterators with a very simple partitioning algorithm.
combinations itertools ↑↑↑↑↑
enumerate PEP 279 I don't see the point of making it parallel, It is used for clarification and annotating the iterable, not computing something. However, it isn't so hard to do so.
combinations_with_replacement itertools the same as combinations and permutations algorithm with extra steps, C implementation in python repository is good enough
locate more itertools parallel version would be to partition the domain and every follower would work on that domain
tabulate more itertools one of the classics of functional paradigm. implementation-wise isn't an issue for serial or parallel iterators, it is just simple partitioning of the input domain, could be tricky because the input data structure must overload + operator.
sample more itertools this is very important in my experience while working with signals and fourier stuff, we can introduce two versions like in MATLAB or interpose in toolz : upsample and downsample.
*union - *intersection - *difference underscore.js mimcing sets operations is crucial in some situtations without the need of having Set data structure, like again when doing singal processing stuff. just one function without having to covert to/from arrays and sets, just one line
*without underscore.js would come in hand in a lot if situations without writing an ugly for loop inside your code
max - min should be very generic to use on any iterable of data structures supporting <
*mapcat toolz really cool and I can thing of tons of situations where you would need it; think of image filters
*frequencies toolz think of an image histogram?

Naively speaking, the only challenge here would be in getting the best performance out of the partition algorithm in combinatoric iterators, and the best algorithms for generating these combinatorics; there is a ton of literature talking about that problem. As for the partitioning algorithm, I want to get the partitioning algorithm to work like simulated annealing, where it starts rotating long suffixes and then cools down, because intervals in lexicographic ordering are very not evenly distributed.

I am completely aware that the current implementation in the gists provided is not appropriate to be merged; no test, no checking for the input, doesn't throw a thing.


Second Milestone

  • This milestone would take 5-6 weeks.
  • Implementation for ordered map and discussion of the API design. We need to do an implementation for priority queue like the one in Louis Jenkins's data structures mason package. That for 1.5 week for example.
  • Implementation for lru_cache utility like the exact one in standard Python functools module.
  • Implementation for partial objects. it would provide use with a curry-ish utility if we have multiple partial objects inside each other, without breaking the Chapel compiler.
  • some selected methods from boost functional library, and underscore.js if there is still time in these 5 weeks.

My goal in this period is to do at least an implementation for a bunch of utilities from functools and functoolz :

  • lru_cache, partial, and as a natural consequence of having partial method it would be cool to have pipe from toolz to to have actual piping functionality in the language.

  • lru_cache is a truly magnificent utility and used extensively in space state search algorithms in AI. Anyways please check this minimal implementation of lru_cache.

    • We need to use ordered map for keeping track of function calls and their order, which is just an ordinary map and a linked list for ordering, but we need to use priority queue like the one in Louis Jenkins's data structures mason package to make the "L" in lru_cache stands for least not last.
    • We would store function calls as hashed tuples of the inputs after converting the tuple to string.
    • The API needs the user to give the call function the input as tuple which I think is really fine.
    • The ugly part of my current implementation is how you institiate the caching object, because you need to give it an example of the output to be able to build the ordered dictionary, but I guess we can overcome this issue.
    • This task would take considerable time because we need to have a complete implemented data structure and discuss the API design with the mentor.
  • partial object is very useful, imagine having a pipeline of functions like in bash where the number of arguments for each function is not the same, but we need to propagate one variables between this thread of function and having all the other variables in every function have a certain value. A very not-even-hello-world example of how things should work here in this gist. One more thought, If the user REALLY need function currying, he/she can do it with partial objects.

  • pipe function is really simple and cool, there is not much to talk about it. Here is a "complete" implementation for it in this gist


Third Milestone

  • This is for about 3-4 weeks.
  • documentation, documentation, documentation.
  • I separated the documentation period from the implementation period so I could look at the already implemented and tested methods with a fresh look and re-iterate over them again and add new tweaks here and there.
  • If and that is a big IF, I finished documentation fast I would really like to start on the operator module. Even just starting with the general layout of the module and how we will handle having the backend in C, so I could continue working on the module after finishing GSoC and have a complete functional programming module like the one in python with the extra parallel touch of Chapel

@omar-3
Copy link
Author

omar-3 commented Mar 29, 2020

hi @ben-albrecht and @krishnadey30.
Please quote the thing you want to see it edited and tell me your opinion !!!

@krishnadey30
Copy link

Looks really good to me. Good work. Please make sure to submit the final pdf.

@ben-albrecht
Copy link

@omar-3 - this looks a lot better!

At a high-level, I find your timeline a bit on the ambitious side, due to the sheer number of functions/iterators listed. Design, implementation, code review, and possibly some performance optimization will be required for each one. It would be fine by me if you wanted to shift some of the goals into stretch goals.

chain: don't need, you just have a single output

Could you clarify what you mean here?

enumerate: I don't see the point of making it parallel, It is used for clarification and annotating the iterable, not computing something

I could see a use-case where you loop over an iterator in parallel and use the enumerated elements to assign the result into an array, e.g.

var A: [1..100] int;
forall (i, value) enumerate(someIterator, start=0) with (ref A) {
  A[i] = someComputation(value);
}

but it's not terribly important.

@omar-3
Copy link
Author

omar-3 commented Mar 30, 2020

@omar-3 - this looks a lot better!

At a high-level, I find your timeline a bit on the ambitious side, due to the sheer number of functions/iterators listed. Design, implementation, code review, and possibly some performance optimization will be required for each one. It would be fine by me if you wanted to shift some of the goals into stretch goals.

I will push some itertools utilities into the stretch goal, but I really think we could pull it off with the design and feedback cycles. I will push some things into the extra time period, but it won't hurt and we can discuss that if the project is selected? I want to be a bit pressured, this is a personality thing. And seeing the already implemented part of itertools in the master repository, in my opinion the throughput of these weeks would be really high to get the same code quality of the master repository.

chain: don't need, you just have a single output

Could you clarify what you mean here?

the input is a bunch of iterators to be concatenated together and the output is just one big iterator. If we want to parallelize that we need a global variable between all the follower iterators because the output is just one chunk of data.

enumerate: I don't see the point of making it parallel, It is used for clarification and annotating the iterable, not computing something

I could see a use-case where you loop over an iterator in parallel and use the enumerated elements to assign the result into an array, e.g.

var A: [1..100] int;
forall (i, value) enumerate(someIterator, start=0) with (ref A) {
  A[i] = someComputation(value);
}

but it's not terribly important.

oh okay :D ... I will drop it.

I'm working on the final version, and I clarified a lot of things especially in partitioning algorithm part and replaced some things.

@omar-3
Copy link
Author

omar-3 commented Mar 30, 2020

and each milestone is about 50-60 days, that is A LOT of time considering this is a full-time commitment. What do you think?

@ben-albrecht
Copy link

I will push some itertools utilities into the stretch goal, but I really think we could pull it off with the design and feedback cycles. I will push some things into the extra time period, but it won't hurt and we can discuss that if the project is selected? I want to be a bit pressured, this is a personality thing. And seeing the already implemented part of itertools in the master repository, in my opinion the throughput of these weeks would be really high to get the same code quality of the master repository.

That's fine. I don't think it matters too much, as long as the proposal is feasible. I like when timelines are more conservative. Other mentors may interpret a highly ambitious timeline as a big plus (e.g. @LouisJenkinsCS :) )

the input is a bunch of iterators to be concatenated together and the output is just one big iterator. If we want to parallelize that we need a global variable between all the follower iterators because the output is just one chunk of data.

This sounds like an interesting design question that may sink into the design of how Chapel iterators work.

I'm working on the final version, and I clarified a lot of things especially in partitioning algorithm part and replaced some things.

OK - I probably won't have another chance to look at this again. It looks very nice in this form. Good work!

@omar-3
Copy link
Author

omar-3 commented Mar 30, 2020

That's fine. I don't think it matters too much, as long as the proposal is feasible. I like when timelines are more conservative. Other mentors may interpret a highly ambitious timeline as a big plus (e.g. @LouisJenkinsCS :) )

I intend to use his priority queue in the caching utility in functools :). I guess an ambitious timeline is a big plus because it is an indicator that the student understands the project, not just barely meeting the minimum requirement. But I will bit a bit conservative in the final version

This sounds like an interesting design question that may sink into the design of how Chapel iterators work.

yes I found it sometimes a bit more convenient if we had a shared object between the followers.

OK - I probably won't have another chance to look at this again. It looks very nice in this form. Good work!

I know, but really thank you.

@LouisJenkinsCS
Copy link

I'll take a closer look at this when I get the chance. I appreciate the efforts and enthusiasm, but I'm concerned about talking the idea of representing lazy computation and chaining computation together w/o first class function support. These things should be overloaded to _iteratorRecord types.

@omar-3
Copy link
Author

omar-3 commented Mar 31, 2020

I'll take a closer look at this when I get the chance. I appreciate the efforts and enthusiasm, but I'm concerned about talking the idea of representing lazy computation and chaining computation together w/o first class function support.

Functions in Chapel are first-class objects with some un-implemented features chapel-lang/chapel#15234. I didn't propose any hacks to get around that issue of returning function objects in any function in the proposal. I didn't include any method which needs that because I was thinking if I included them they would need to be re-written in future releases when Chapel get that functionality anyway, so it wasn't worth the effort.

These things should be overloaded to _iteratorRecord types.

I didn't know the existence of such a thing, but I found these issues #12814 and #12859 talking about it. I will check them out and get back to you.

@omar-3
Copy link
Author

omar-3 commented Apr 2, 2020

Hi @LouisJenkinsCS, I went through the issues I found talking about _iteratorRecord stuff in which you proposed FunctionalOperations.chpl. Although that invalidate 80% of my proposal, but I must say the overloading all the itertools proposed module to _iteratorRecord seems much more verbose and nicer to look at, but how someone supports parallel iterators by overloading _iteratorRecord? @ben-albrecht

@omar-3
Copy link
Author

omar-3 commented Apr 2, 2020

and performance-wise, I don't think there is a much difference? I think this kind of module doesn't aim for performance as much as expressibility as you said in the issue. but anyways I guess implementation-wise, it won't differ for serial iterators.

@LouisJenkinsCS
Copy link

@omar-3

I agree that implementing itertools and other functional iterators and adding everything to _iteratorRecord would be the ideal scenario. However as mentioned, _iteratorRecord is serial only, and will not even call a parallel iterator (in fact it even throws an unrelated buggy compilation error when trying to force a parallel-iterator TIO).

This makes the iterator tools useless (performance would be borderline unusable) in distributed parallel iterations, but very ideal and time-saving/convenience functionality for shared-memory. Although to be fair, Chapel will never have anything like Hadoops Map-Reduce level of performance anyway, not even close, but I would be interested if we could devise some kind of 'hack' or work-around to make this work with iterator records, even if it involves updates to how the compiler resolves _iteratorRecords to the iterator it is uses.

@LouisJenkinsCS
Copy link

Honestly something like Spark would be super-cool to have, say in addition to itertools. Maybe what could happen is that you have some wrapper that holds all callbacks to be called in some container and applies them in order. Proof-of-concept that does not compile here: TIO -- I haven't a clue why it isn't compiling, its due to either some new change I am unaware of or some compiler bug I'm unaware of. It illustrates the point though.

@omar-3
Copy link
Author

omar-3 commented Apr 3, 2020

having a Chapel API to spark like PySpark would be super useful and would make Chapel finds its way into new markets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment