Skip to content

Instantly share code, notes, and snippets.

@saprativa
Last active June 24, 2024 12:22
Show Gist options
  • Save saprativa/b5cb639e0c035876e0dd3c46e5a380fd to your computer and use it in GitHub Desktop.
Save saprativa/b5cb639e0c035876e0dd3c46e5a380fd to your computer and use it in GitHub Desktop.
Summarization of Long Documents using Transformers
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "summarization.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "EABqkrjYerSx"
},
"source": [
"# Colab output wrapper"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "z3HW7x111o9w",
"outputId": "04c13547-941a-48ae-9f7d-ea740779ebf3"
},
"source": [
"# wrap the output in colab cells\n",
"from IPython.display import HTML, display\n",
"\n",
"def set_css():\n",
" display(HTML('''\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" '''))\n",
"get_ipython().events.register('pre_run_cell', set_css)"
],
"execution_count": 25,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fcU3meX9e0b5"
},
"source": [
"# Install Transformers"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 425
},
"id": "JUgW5Gy10DxC",
"outputId": "22fe1d2e-97d1-4b35-e8fd-d96d043d189c"
},
"source": [
"# install transformers with sentencepiece\n",
"!pip install transformers[sentencepiece]"
],
"execution_count": 26,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Requirement already satisfied: transformers[sentencepiece] in /usr/local/lib/python3.7/dist-packages (4.10.3)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (5.4.1)\n",
"Requirement already satisfied: huggingface-hub>=0.0.12 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (0.0.17)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (1.19.5)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (2019.12.20)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (2.23.0)\n",
"Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (4.8.1)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (4.62.2)\n",
"Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (0.0.46)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (21.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (3.0.12)\n",
"Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (0.10.3)\n",
"Requirement already satisfied: protobuf in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (3.17.3)\n",
"Requirement already satisfied: sentencepiece==0.1.91 in /usr/local/lib/python3.7/dist-packages (from transformers[sentencepiece]) (0.1.91)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from huggingface-hub>=0.0.12->transformers[sentencepiece]) (3.7.4.3)\n",
"Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers[sentencepiece]) (2.4.7)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers[sentencepiece]) (3.5.0)\n",
"Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.7/dist-packages (from protobuf->transformers[sentencepiece]) (1.15.0)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[sentencepiece]) (3.0.4)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[sentencepiece]) (2.10)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[sentencepiece]) (2021.5.30)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[sentencepiece]) (1.24.3)\n",
"Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[sentencepiece]) (1.0.1)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[sentencepiece]) (7.1.2)\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2jeHwHDYe_5i"
},
"source": [
"# Read input file from Google Drive"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "l9b_kRpd0MaA",
"outputId": "34b4c5ac-4e43-45f2-9a41-112aa3debfb2"
},
"source": [
"# open and read the file from google drive\n",
"file = open(\"/content/drive/MyDrive/Colab Notebooks/data/summarization/ami/ES2002c.transcript.txt\", \"r\")\n",
"FileContent = file.read().strip()"
],
"execution_count": 27,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "YQZdCUmB0YeE",
"outputId": "d06fc04d-332e-4dd6-bf1d-6f0914248ed8"
},
"source": [
"# display file content\n",
"FileContent "
],
"execution_count": 28,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"\"Yeah. Yeah, sure. It kinda does make sense, doesn't it, because when we get into the end of meeting we're kind of talking about action and design as opposed to background. Everything I have is kinda background. Mm-hmm. Uh that sounds. Sure. Okay. Sure. Yeah, cool. Why don't I get that? Hmm. Okay. Okay. Um alright so c is it function F_ eight? Hmm. Come on. I think it's working. Okay great s so let me just start this. Okay great. So um uh s move on. Uh-huh oh where'd it all go? It's not good. Okay lemme just see where I can find it. This looks more like it. I think I just opened up the template. Sorry about that. Okay alright so let's have a look here. Okay so this was the method that um I've taken. Uh basically what I wanna do here, before we get into it uh too far, is I want to show you all the background information I have that I think we need to acknowledge if we want this to be successful. And uh and then sorta g go through some of the way that I've dealt with that information, and then sort of bring us all together into it to see sorta see how this fits in with the overall vision. Um so I've tried to take a whole lot of market research and summarise it for us, and then ide identify uh trends that are are sort of in sync and are important to our our uh p project plan that we have so far, and then uh initiate a kind of discussion on design options so that it sorta helps us to to narrow in on on aspects that will inform other uh other elements of the of the project. Does that make sense, tha that sort of strategy? I thought that that will impact on the rest of what we do, so that's why I suggested we get in this. Okay so out of um different uh figures and ratings ob uh of people in general, um consumers in general, the number one thing that was found was that uh the br t television remote control, a fancy look and feel, okay, and not, it specified, not a functional look or or feel, uh b f f fancy. Um however, this is where we kinda have to be very, I think, creative about it. Number two was that it be innovative. Okay so that tells me that we have to find a way to be innovative without a adding just unnecessary um sort of functional bits to it. Uh and third priority uh for ease of use, so again that kind of gives us a general picture of how it has to be, um quite user friendly while still having technology. So it I'll just say right away as a bit of a foreshadowing into how we proceed with this in terms of m marketing, is that I think um what we should think about is how the um about how the innovation uh contributes to the look and feel, and not so much to the functionality of it. For example like when you pick it up and push it like it all lights up or something, you know what I mean, like, or it's got something else to it that just seems innovative because obviously the thing that the message here is ease of use. So how do you make innovation make something more more easy to use? Well that's I guess where we're gonna go with this. Okay then there's the other aspect of the back the the market um research I have here is on fashion style, okay, which as we've agreed is a priority. Uh top European fashion trend um that I read about says there's this emerging theme of fruit and vegetables, okay, especially in clothes and furniture. And when I first saw that I thought hmm, well do we want to actually try and think about this trend and how we add something to it, or we get right into it, or we completely steer away from it, do you know what I mean? So my my feeling is that we w do want to observe this trend, but we want to think also about the fact that it sort of has to fit in with something which is not specifically electronics. Um 'cause I think what we're in what we're in is partly sort of home decor, partly something like a computer, um so I think we might wanna be careful about how you know how quickly we create like a remote control in the shape of an apple or something, I think that would be pushing it. And then in terms of m material trends are for things to be soft and spongy and sort of, you might say ergonomic or or friendly to handle, which is which also in indicated that last year this was this was not the case. So um probably a lot of the competition on the market will be still in last year's mode, so if we try and really capitalise on that, I think that'll be in our favour. Um So these this is the summary of everything. Um style is number one uh thing in the in the market of who we're selling to. Uh innovative design technology's also a must in that it's seen it'd be seen to be uh cutting edge, uh but ease of use t has to be insured throughout. That was like the number three thing. And then at the end there are vibrant natural colours um that's the way I interpreted it anyway, softness in materials, shape, and function, and so I've written at written at the bottom to give us sort of a context of discussion, Mac iPods, something which is, I'd have to say very high-tech, ten gigabytes, whatever, but when you hold it in your hand there's like no buttons. You know what a Mac iPod is? I'm thinking however Mac iPod is sort of last year's because it's very hard and sort of glassy and glossy, so I'm thinking if we imagine that we're taking some of the features of a Mac iPod and we're then making it s more of like a more of like a comfortable type of or more of like a maybe more vibrant to friendly thing to have. Um and then so this is w with all that information what I'm what I'm suggesting in this slide here is that we we take these ideas, and as we get into more the more um techni like sort of production side of things, that we think about shape, materials, and themes or series that go throughout. Sort of like a I dunno like um we think of some kind of a thin theme that unifies it all, that we agree on, uh sorta like a marketing identity. Um Does that make sense? Yeah. So so like I threw out a few ideas there just to kinda get us thinking along those lines like lemon, lime, I dunno, green colours, pe whatever, it's just an idea, 'cause I'm thinking that some of these ideas will seem quite coherent if we use them in terms of their what people associate this them with in terms of texture, shape, colours, things like that. Like um the ones the ones which I'm most fond of in terms of giving like a theme to it would be like um like lemon or something like that, you know something which is, like you see a lot in in other areas. Like I see lots of websites and things that seem to associate with like lemon and lime and So anyway it's just just an idea. I'm thinking maybe we could incorporate some of these features into a fairly um into something which is which seems to have something to it which is almost gimmicky because like um like something to do with like lighting within it. Like you know just within the simple sense, when you pick up a phone and touch a button it uh lights up, q usually the buttons light up. How can we build on that? Maybe like it could light up in different colours or something or or people could buy the buy the control and then it comes with different like covers or something so. Anyway those are that's all I have, but uh hopefully we can we can revisit those ideas when we get into Yep. Did you press F_ eight? Yeah. Okay, do we have a corporate colour scheme? I didn't know. Okay. Okay. Okay. Yeah. Mm-hmm. Mm. Mm. Great. Lots of good information there. It's a shame the cable wasn't just in the middle of the table, huh? Just um It takes a second, doesn't it? Mm-hmm. Mm-hmm. Mm. Mm-hmm. They're standard, aren't they? Yeah. Hmm. Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Hmm. Hmm. No. Yeah. Mm-hmm. Mm-hmm. Hmm. Yeah. Hmm. Hmm. Mm-hmm. Mm-hmm mm-hmm mm-hmm. Mm-hmm. It can't be curved. Okay. Uh question on can I ask a question? Okay. Can we uh power a light in this? Can we get a strong enough battery to power a light? Okay. So maybe one of the things we can just try and include is a really good battery. Well I mean I'm thinking it might be That for uh this to be a high-tech thing it's gonna have to have something high-tech about it and that's gonna take battery power, and to make that to make that a realistic goal I think one of the issues that will come up later is, can the battery power it? Illuminate the buttons. Yeah it glows. Well m I'm thinking along the lines of you're you're in the dark watching a D_V_D_ and you um you find the thing in the dark and you go like this, and that's what everybody does. Oh where's the volume button in the dark, and uh y you just touch it, or you just pick it up, and it lights up or something. Like a phone, yeah yeah. Whereas with phones, people charge them once a week. We're gonna need to put in a really good battery so people don't have to charge their r remote control every few days. But are people gonna wanna shake their movie controller? Right. Sure. Okay. Right. Mm-hmm. Okay, great. Okay. Okay. Yeah. Yeah. What's that? Yeah. Okay. Well I mean I'm thinking that what we need to do is have something that kind of unifies a lot of the different concepts, and if we think that what we are w our number one marketing motive is um the look and feel. So for the look and feel to seem coherent and not just sorta bits and bits and pieces of of concept and technology or or whatever or fashion, then we should have it kind of come back to one thing that we kind of all sorta can visualize. Um maybe what we could do is t th think about a concept which touches b back to the on the um the colour, you said company colour yellow. I mean if we think of something, like I was saying also lime and lemon you know, what can we come up with something where we we try and associate it with with like the series. We just come up with something like that we kind of use it as a theme to inspire the shapes and things. Yeah. Yeah. No no no not at all. It's more more just that we we think about like what it is we're trying to achieve, so and then we have one one sorta theme that we stick with. Do you know what I mean? Right, sure. Yeah. Um can we yeah like to and wha like do you have a Can you like yeah just t we can visualize it. Okay. Okay. Yep. Mm. What would it achieve? Well L_C_ well I'd when you used to mention the L_C_D_ I'd think I wonder what that would be about. And the th the thing I could see it helping with would be if it was somehow connected with um listings. So as you scroll through, 'cause we said we might have a jog dial, so as you scroll through your stations you can y it actually tells you what it is. Right, okay. Mm-hmm. Mm-hmm. Mm not real Mm-hmm. No. Mm mm. Well i I was just Yeah. Yeah. Yeah, sure. Um I've had kinetic things before, and the the the one issue we need to keep in mind with them is that you're committing the user to moving it, and watches yeah Sure, okay, right, okay. Support for it. I mean just it's just worth pointing out because like I've I've known I've known people to have kinetic watches that they wear all the time, and it's just like magic because it's always powered and there's no battery. I've also known people to have things like like a jewellery watch they wear from time to time, and they eventually just say it's just too much of a nuisance because I don't wear it all the time. Like remote control is similar, you're away on vacation, I dunno whatever, you something, and it just starts to get worn down. So we should think about Yeah. Yep. And this size here, I'd suggest this be small, like quite small. Um just a a lot of the um I mean one of the things running through my mind right now, I realise we're being efficient to wrap up the meeting and have lots of decisions made, um but we are leaning quite a bit to the side of being low-tech, rubber buttons plastic frame, it's almost like we're reproducing the same old remote control that's out there. Should we think about how we are actually getting this high-tech user friendly uh um theme like what is it that we're u we're using to to achieve those goals? Like Okay so so backlighting, that would be good. Yeah clear, that'd be Okay. Yeah sure. Yeah that'd be really good. Yeah. Sure. Yeah, yeah. Yeah. Sure. Yeah they they emanate a light through it. 'Kay. Yeah, mm-hmm. Um and then the other thing that we we're s we've committed ourselves to achieving is simplicity, and so I'm thinking maybe should we try and think about having something like um some kind of an innovative concept about how the um the volume and the channels are controlled, 'cause that's the main thing people will f wanna do. Could we use like a jog dial, like a nice just sort of round, somewhere on it where you just roll it? Or Yeah. Mm-hmm. Well why don't we do it like a mouse then? Yeah. Mm-hmm. Yeah. Yeah, sure. Mm. Yeah. But that's not a bad thing is it? Because when you think about it, the alternative is to go push the button. Jog dials are much easier than that. You just roll. Yeah. Yeah. Yeah like I mean if we if if we keep coming back to this board here, I wouldn't be surprised if we could take this idea, imagine that, I dunno, that it's within the shape of the hand, it's quite small I dunno. Yeah. It's small, and that we've got like the the l slogan somewhere like on the casing at the side, and that yeah well I mean isn't that what we just h said said we s just have to decide now? Yeah. And then like a jo And then like a jog dial somewhere that fits in with the shape of it like I dunno like here, in with the It would get bumped, it's doesn't really fit with your hand. Yeah. Or maybe just fit it in like down the middle here. A jog di Yeah. It's kind of yeah Yeah. Huh. Um Yeah I think the jog dial, you know it just after you drew that, what if it was flat and you just spun it, that'd be great. Yeah. Yep. Sure, yeah, yeah and materials we sorta said we'd do plastic and rubber, didn't we, and I think maybe we should try and stay away from just the big protruding rubber buttons, 'cause that'd just be so standard. Something a bit more flush, yeah, or maybe have rubber incorporated into the case as well, so that it has and also t plastic I've seen can get really textured, so you can get plastics that actually feel soft in your hand. They feel kind of like um, you get pens now and then that you'd think that they were rubber but they're not, they're actually just plastic that's textured, kind of a little bit like Okay. Okay.\\nIs it working? And so think of this concept. Um to research it I've um had a look on the the homepage again. It's provided me with more examples of um previously existing c remote controls. Um there's a wee bit of discussion about the other existing ones there, um so I've taken the um suggestions from them and tried to incorporate them into this um So then this we're looking for um suggestions on size th um size of control and the buttons, um the shape of the control, and whereabout the buttons should be located on the control. Um what I found from the research is that most the current controls are just basically big bricks with loads of buttons all over them. Um they're not very attractive to look at, and they're not very comfortable to hold, they're I just hold 'em like big bricks, and they're very easily lost. Um they tend to be very dark colours, so if there are shadowy places down the side of couches you can't really see them. Um the the controls themselves tend to use a very inconsistent colour scheme. Um for instance, the stand-by button isn't always red, uh it really should be. It's uh something the user then uh identify with. This is a red switch off, that's how it should be. Um I'm not sure if there's any other examples of that, but something to look out for. Um there's a problem that I've I've got couple of preferences for the the end control um I get 'em with the the red colour button for stand-by and s the other examples of that um The buttons should be large. They shouldn't be tiny little things like you get on some mobile phones. They should be easy to press, very comfortable. Um one of the examples given on the homepage was um there's an up and down volume button but both of them have a V_ on them, so the up volume button looks like it should be a down volume button, that's kinda confusing. Um should avoid s things like that. Um if the the corporate colour scheme allows it we should have a very bright colour so that it can be easily identified anywhere. Um obviously trying trying to avoid being tacky there, but it could um tie-in very easily with your your lime and lemon idea. Fantastic. Um any extra features we add beyond the basic ones should be m hidden, they shouldn't be on the um shouldn't be visible without something be opened or some sort of special extra effort. Um if we did decide to go for voice activation there sh should always be a button as alternative, possibly hidden in the the opened up section um making that something is wrong with it or with somebody's voice, maybe they got a cold or Um we should definitely avoid the big square block look. That's just wrong. And um we got an email uh from I think it's the the research department, and they've said th the voice control um can now talk back if you ask it a question. So it sh it could be good to have them um confirm any action you take and possibility. Right and these are problems I've had with it. Um I don't know where the slogan should go, or really what the slogan is. I think it's um, fashion into electronics. And we don't know how flexible the colour scheme is. I mean you say you wanted the the corporate colours, but they don't say you know if we can use any other colours at all or That's it. I think I'm cool. Right. Right. Um I think we could go for like um maybe not a p a fruit shape but a very sort of curvy type shape. Um you could have the same sort of texture and colour as a fruit. Um probably something that s sits in your hand comfortably, sort of feels right in your hand. See I'm I'd quite like a sort of uh snowman type shape. Um so a p sort of larger bit sits in your hand, and then you got maybe another bubble at the top for just any other function you need. Um something like that um you got two groups there um maybe it could fold up and you get a third group inside or uh you have volume controls about there. So I reckon it'd look quite nice if we just had um this here, had a sorta background yellow, and then have sort of a nice bold colour for the buttons. I think that might scare me. I think that'd probably scare me. You turn it on your control possessed s. Um Nah. Um well I think the advanced ones the the ones you don't usually use could be hid inside. B um I think the we had were fairly basic ones, they'd have to go on the the front somewhere. Um Yeah I think so yeah. Mm. Well, but then for um for skipping a large amount of channels you do have to uh to skip the channel button, the number part. Uh but Yeah. Mm.\\nOkay. Okay, well Okay. Okay we all ready to go? Well how um on the in this meeting then if we um I'll just just recap on the minutes from the last meeting. And we uh decided on decided on our our target group being fifteen to thirty five, and we decided that it was gonna be non-rechargeable battery-powered, that we're gonna group our audio-visual and other functions into into those categories, um. And I told you guys about the three new requirements about ignoring teletext, ignoring everything except the T_V_, and trying to incorporate the the uh corporate colour and slogan. Um so that was the last meeting. Is there anything have I forgotten anything? Is that everything? Okay. Um so if we have the three presentations, and then if you have anything to kind of that you know you're gonna want to discuss, maybe just make a note of it, and we'll have all the discussion at the end. That might be a better idea this time. And so if we start off uh with Andrew and then Craig and then David, if that's alright. Um and then after that we'll have to make some decisions about stuff, right. So if you wanna take this. Screwed in quite tightly. Uh what did uh how did we leave it with speech recognition now? We did we say we were gonna try maybe incorporate it but we hadn't made a definite decision on that? Right. Oh I should also point out that um the you know the kind of final objective of this meeting is to reach a decision on the concepts of the product. So um that's kind of the end result hopefully. Uh-huh. Hopefully appear in a wee second. Up there we go. Oh no. Oh right. Here we go. Okay. Mm-hmm. Aye a fair point definitely. Okay. Aye right. Uh-huh. Okay. Okay okay. Mm-hmm. Mm that's true, yeah. Mm 'kay. Great. Okay. Yeah. Ah. Okay. That's great. Uh-huh. Okay great. Um thank you for that. Uh Craig do you wanna uh plug yours in then? Mm. Not quite. Oh something coming now, yeah. There we go. Mm. Mm-hmm. Mm-hmm. Okay. I think it's yellow because like the website is yellow and there's a band at the bottom is yellow, so yellow, lemon, you know definitely food for thought there, but keep going and we'll discuss it after. Mm-hmm. Mm 'kay. Mm. Aye that's a good idea, yeah. Yeah. Okay. Okay. Yeah that that was very good, and uh now with David. I know it'd be handy, wouldn't it. Do y do you wanna sit in the the line of sight of this um Yeah. Mm-hmm. Hmm. Mm. Mm-hmm. We you couldn't have like plastic and rubber? Yeah. Mm 'kay. Okay. Mm-hmm. I'll clear one of these things for you. Just by moving it yeah. Mm. Mm-hmm. Mm. W yeah. Uh yeah yeah, I see. Yeah. I know what you mean yeah. Mm-hmm. Mm-hmm. Mm. Okay. Okay. Right. Okay. Right can I Yeah well yeah it's just I'm quite keen to get the discussion going with the time we've left so but yeah you c ask away. Mm. Why what kind of light do you want are you thinking of? Uh-huh. Yeah. Yeah yeah yeah. Like a phone yeah, like the backlight in a phone. Okay cool. Yeah. Mm. Mm. Okay. Okay. Right okay um well let's just go right back to the marketing ideas for a start, and just giving an id idea on the time, we've got about fifteen minutes to play with at most. So um yeah so just t to bear in mind that the ultimate goal of this meeting is to reach an decision on the the the concepts of the product. So back to your idea about um incorporating the idea of like fruit and veg, and the corporate colour, and things like that. Um I mean what does everybody think about Does anybody have any ideas of about how we can fit all that in together? I mean that's kind of the user interface type of thing, what are your thoughts on that? So maybe do y are we thinking something that like s could sit in your hand comfortably, or do you th you'd hold onto comfortably or So something quite curvy? Okay um right okay. Colour-wise I mean you made a re uh was it you or uh I can't remember who made the point about how if you've a nice bright colour you'll not lose it, was that Whose about how if you have a bright colour you'll not lose it so much. Um and when the corporate colour is yellow, I mean maybe we could think about about the colour of the whole product being yellow I don't know. Um And then obviously the uh the materials when it has anybody got like an overall picture in their mind about what what might work? That's all. Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Okay. Right. Oh you know like in circular in shape or Choice of material yeah. 'Cause I I I was kinda thinking about as well you know how you get these shock resistant mobile phones, and they're plastic but then also have like rubber on the outside, and it kinda feels it feels kind of warmer to the touch. It feels a bit more comfortable, and maybe we could incorporate plastic and rubber into it. And then then we could have curved shapes, 'cause wood or titanium, yeah, it's gonna have to be boxy and rectangular and I think we might be moving away from that you know so um Well I'm do we really want it in like the shape of a lemon or no I don't think we do either. Okay right well um so thoughts about the actual shape of the thing. A snowman shape? Uh-huh. That's quite a distinctive shape, that would be good wouldn't it. Yeah so yeah should we go with that? Do you wanna draw it on the board? Ooh that'd be good. Mm-hmm. So call it the snowman-shape trademark. Yeah that's cool. Um and I mean colour-wise what does everybody think? I think it is quite important to get yellow in there somewhere. I mean do you want the whole thing yellow, maybe like yellow and white do you want something Uh-huh. Okay cool. Um and also I mean how are we going to incorporate the slogan in? The fact that it talks to you, I mean it might be quite cool if when you first start using it it says, what is it, putting fashion into electronics or something, I dunno. Or when you like or if you turn it off or something if it can speak if it could actually say the slogan it might be a bit more powerful than just having it written on it somewhere. I d I d any thoughts on that at all? I know. Um unless an a I mean if you also would that work if we wanted to incorporate um an L_C_D_ display, where would we put that? Would we put that on the inside or It's bound to increase the cost of it a lot, I would've thought. Yeah. Mm oh yeah that's true. Yeah. So so no need for an L_C_D_ display? I think that would make it very complex. Yeah. Yeah. I don't know if there is really, no um I would say no need for a talk-back. Uh does anybody disagree with that? No? Easy. Okay um right so you're gonna have the three different sets of of functionalities, um I mean do you wanna group them into s head of the snowman, body of the snowman, inside of the snowman, is that what you're thinking? Okay. Okay right um what else do you need to talk about? So I'm just gonna um pop this in here 'cause I have a slide about decision making which I'd forgotten about. Oh sh God we've got five minutes um okay uh back we go. Um energy what do you think that's suggesting we're how we're powering the thing? I really like the idea of this kinetic thing where you'd have the back-up of the battery, but have have kinetic power, I mean what does anybody think about that? Okay. Mm-hmm. Yeah. Yeah. Well I suppose that if you're if you're away and you're not using it, then you're not using any power either. So you'd have the battery as the kind of to keep it ticking over idea I'm really sorry we're gonna have to wrap up quite quickly, we don't have as much time as I thought. Um so I think that's what energy is referring to here. Chip on print, is that that's an industrial design thing, is it David? Okay um as for the case, kind of discussed that Yeah I know we're gonna have like rubber buttons that feel kind of Okay. Mm-hmm. Mm 'kay. Could have things like backlighting the buttons and stuff like that. Aye that would be a good idea. S so like cur slightly transparent case, so it's yellow, like tinted yellow, but you can maybe see through it. Is that what you mean? Okay. Lights. Okay. Mm. Yeah. Yeah if you are holding it in your hand you could you could do that, couldn't you? If you're holding it in your hand you could Do you think? Okay. Mm. Yeah okay okay um Yeah. Okay. Okay. Just Okay um right well wouldn't it we do need to make a decision on whether we want to incorporate a jog dial in nice and quickly. Um I'm all for them actually, I think they're quite you know th very quick to m to use. So does anybody oppose the idea of of incorporating one into the design at all? No. And the other thing was um can we think of any way of getting the slogan into this thing? Uh-huh. Uh-huh ooh okay, we really gotta wrap up so yeah. Okay well if we can do that, great. Yeah okay. Yeah let's let's try and get the slogan on there um, and Mm. Okay. Right I'm gonna have to I'm really gonna have to hurry you on here 'cause we're we're actually over time. Um is there anything anybody's unsure about? Just for in closing just the next meeting's gonna be in thirty minutes, and so you can see in the screen here what each of you are gonna hopefully be doing, uh I know that the designers are gonna be working with Play-doh on that. So um that'll be that'll be good. Um and I'll get the the minutes up as soon as possible. Anything at all you think we haven't discussed that we need to? Is everybody kind of happy about what they're gonna be doing? Okay. That's kind of a design thing that you guys can can discuss, yeah. Okay. Yeah. To make something flush with the case? Okay right. Okay. Sp kinda grippy? Okay. Okay I'm gonna have to I'm gonna have to call this to a close 'cause we're way over time. So um that's really good, like we've s had much to talk about that um pretty much run out of time to do so. So off you go and design stuff wooh. Yeah quite jealous actually.\\n'S to do now is to decide how to fulfil what your stuff is, so in that sense so it does kind of make sense, yeah. Yep. Yep. Yep. Mm-hmm. No. Yep. Mm-hmm. Yep. It's probably not sending. Yeah. Yep, there it is. And the Play-Doh 's yellow. Cool. Mm-hmm. Mm 'kay um. Oops. Yeah okay. Let me just get this going first. Ah there it is. 'Kay, that should be it. Okay um I guess the same thing again, I started with something very basic. So just so you guys have some idea of what's involved in my process, um and then you can just work through it and we 'll either modify it or start from scratch um depending on what your needs are. Um the components are exactly the same. Um I think, like what you guys said, um the most input that's needed is basically in the user interface. The rest of the components um they do have an impact in terms of cost and complexity. Um like you said time to market was a problem, um and how many components are physically in there in cost. And the power is basically a factor of that. Um and the lower components, the power, the logic, the transmitter, and the infrared, um they affect you in terms of the size of your device, um and that would have some inte impact on how y I think more how you hold rather than um the actual use using the the remote control because um like we've said we've defined, like we only want the basic things that to be visible, and the rest of them we try to hide. So um you know it's just a matter of working out space. So I guess three things, um cost, um complexity, and the size. These are the three things that um will have an impact on you. So just go through it in the components. Um these are the options that are available to you, um I'm not very sure about the voice thing 'cause I got another email and it was in fact quite sketchy on what n the voice options are. Um it said it could talk to you, but it never said anything about being able to listen. I it said something about a sensor but never clarified that. So maybe if you well I could see the other email that they sent you, um 'cause they got back to me with like different requirements, or different offerings of what components availa Okay so your basic components are buttons, okay and you have a wheel available, like a mouse scroll wheel, okay there's an L_C_D_ display, um I think these are quite standard things. No um they're well in the sense that these are all the options available for you. I'll explain to you the complexity and the cost thing again a bit later. Okay um then there's um how the case actually looks. It can actually be flat or it can be curved, um and then the different types of materials that you can use, um I don't think you can use them in a combination, um but um I could check back for you, but I don't think you can actually use them in a combination. Um I think plastic and rubber would be fine, but plastic, rubber, and wood, I wasn't I'm not very sure about the titanium. They had some restrictions on using the rubber and the titanium. Um the rubber was a restriction on the kind of power source you could use, but the titanium had a different kind of things on the shape of the thing, so I think that there is some restriction on um I think you could probably group plastic and rubber together, wood and titanium, but you know it might be easier from a cost perspective and a complexity just to use one. You know as opposed to two. Um and the other components are logic chips, um again I'll I'll go back to the component chips. The com how complex or how easy the logic is, it depends on how many functions you have on the on the unit um and that impacts cost. Um I don't think the logic chip has a issue about size 'cause they should be about the same size. Power consumption should be about the same. Um I think the main impact is complexity, um and the other thing is um the power options. Um the first one is a standard battery. Okay the second one I think is more of a gimmick then actually a useable thing, it's a wind-up you know, a crank. Yeah but that that might be something I think that's more of a look and feel decision because I don't think you can have one power source if you're using the alternative power sources. I think whatever it is you still need a battery 'cause I don't think anybody wants to keep doing one thing. Okay the other ones are a solar powered cell, which may not be a great idea in Europe or any country that has seasons 'cause half the year you'd be dead. So like what I said, you probably need like a battery and something else. Um and the kinetic one I guess for me is the most interesting one because it's movement and people like to fiddle with their and it's a nice sales gimmick I think. From a marketing gimmick it it's a technology thing, it's a shake it it doesn't work, shake it, knock it or something. You know you know you have you had those balls, you know those stress balls where you bounce the ball and it and it lights up and it goes, you know that might be a gimmick combined with rubber. You know just to if you get frustrated wi remote control you can throw it, kind of you know just uh you know um so. Um okay my from my role, I don't think that personal preferences but role preferences, I think um something comfortable to hold, um small and slim I guess that's more in the sense of small and slim in terms of comfortable not so small you can't, you know like a phone or something, too small phone. Um and the other thing is from a production point of view um the less components we use and the simpler the components means you reduce your cost and you increase your profit. Um and also the time to market and the complexity of developing designing and debugging it um so. Um okay let me just go back and talk about some of the restrictions. Um The user interface restrictions basically means that if you use more complicated features, like the buttons are standard okay, the L_C_D_ panel and the scroll wheel you need more complicated logic. Um the case okay with a rubber case you can't have the solar panels. Okay with the titanium case, let me just check that um, titanium case can't be curved, it has to be square. Okay um there's no restriction on the plastic, and it can't be curved on the wood. So that's again, I don't think you can use them in a combination, um especially the titanium I I suspect they're very fixed to a particular need. So um mixing them may not be a good idea um yep. That's it. Um I think we could because the L_C_D_ panel requires power, and the L_C_D_ is a form of a light so that Are you thinking are you thinking of of a light in the sense of um a light light, or a light in the sense of it glows kind of you know Frankenstein, it's alive. Okay. Okay. Okay. Um that's why I think the option of the the kinetic thing which basically means as long as you shake it like a watch, like an automatic watch um it's probably sensitive enough when you fiddle it. So you could trigger that to a light, like I said the bouncing ball thing, or you could trigger that to use that to power the light as opposed to so when they pick it up, right, and then that that sorta triggers the glowingness. I think he made that. Is there a particular shape that you're interested in? Like does marketing have any research on does it need to be long? Does it need with a square thing wha Yeah 'cause that will n help narrow down the choice of Like fruit. I'm thinking fruits in my head, but that's tacky. Do we need an L_C_D_ display? What what's the functionality of that? Yeah but the question is what are we using it what would we what would we achieve from it? Putting in lights is cheap but putting in an L_C_D_ panel just to make it glow is a bit of a Mm-hmm. I think that will be a problem because we don't have an input device to get the listings into it, so um it's a bit nuts to get the Monday Tuesday Wednesday you know. Um I I'm not saying there's no need for an L_C_D_ display, but um it's what's what what would it tell the user, 'cause the L_C_D_ tends to be an output as uh as opposed to an input so um does the remote control need to talk back to the user? We have the option of the speaker as well the sa the same thing goes for the speaker, is there a need for the remote control to to talk back? Um You could put a game on it. When the T_V_ dies you can play with the remote control. Where would you physically position the buttons? Um I think that that has some impact on on on many things. Um maybe you wanna draw onto the Yep. No, like I said we have a h hybrid kind of thing, so it's not gonna charge the battery, it's just Yeah. Yep. Yes yes. Or even a clear case. Um you know a a glowing a a glowing yellow type case where the yellow is showable, but in the dark it sort of, it's alive. Um in in a slight subtle way. Yeah yeah. Yeah. Or or there might be a light running through it like a mouse. You know you have cordless mice and they don't eat that much power right. So the power the battery in that sense, maybe you have one or two stratig strategically placed lights that sort of Yeah but because the case is transparent so it gives it a little bit of a glow, doesn't make it freaky. The question is when you're rolling it, how do you wanna roll it? Do you want 'em to roll it like that? Do you want 'em to roll it like that? 'Cause in a mouse your hand's in a position to roll it, whereas the other thing about having it jog dial this way, it tends to get moved accidentally. That's a very unnatural motion to yeah. Can you imagine you have to scroll a lot. Um it might work for volume, and maybe some of the brightness controls and stuff like that, but not for channels right. If you have a Telewest box you've got like, you don't have to buy all the channels, you've about fifty channels, can you imagine trying to. Um and I don't think having that you know too quick too slow kin it's confusing to the I dunno. But users tend to tend to want to use that and once they lose out on the user experience they're like Because that's becomes the most accessible thing in front of Yeah. So you wanna expand the shape of the That that might have one problem in terms of um in terms of whether you're left handed or you're right handed you might be locking yourself in. Could I just could I just jump in and suggest something quickly? Um I think one thing would be the jog dial 'cause that's gonna have quite a big impact on the thing um Yeah that's what I was thinking the a slide, because then you you don't have to put the hand. I think incorporating a logo is quite straight forward. There's lots of space for it um Yeah but it's also a a marketing and a function Yeah. Feel like fruit. Fruits kids. No like Yeah yeah. Yeah yeah kinda like that yeah. Play-doh time. You got to choose first. No, we're kidding. Okay, can I just swipe your power cable, I don't think it matters. Okay lemme okay, I'm gonna pull everybody out first and then put in whoever needs to be left. It's you. Argh. This is a real hassle and a oops. I'm gonna take the microphones, 'cause it's too lazy t take them off again. Cool.\""
]
},
"metadata": {},
"execution_count": 28
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "iJ3crA_B7dCu",
"outputId": "749d70eb-e843-405a-e969-b1ca24033a36"
},
"source": [
"# total characters in the file\n",
"len(FileContent) "
],
"execution_count": 29,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"39719"
]
},
"metadata": {},
"execution_count": 29
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "flBNsq8NfY0Y"
},
"source": [
"# Load the Model and Tokenizer"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "By4ADnD40Zmt",
"outputId": "91260147-98ed-4377-d94e-7a7eaf093bc6"
},
"source": [
"# import and initialize the tokenizer and model from the checkpoint\n",
"from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n",
"\n",
"checkpoint = \"sshleifer/distilbart-cnn-12-6\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(checkpoint)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)"
],
"execution_count": 42,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WKVut2I8ffeG"
},
"source": [
"# Some model statistics"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "kfeTQ3u_0_nc",
"outputId": "7641821c-905c-4042-c1a7-5aca2531dc2f"
},
"source": [
"# max tokens including the special tokens\n",
"tokenizer.model_max_length "
],
"execution_count": 43,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1024"
]
},
"metadata": {},
"execution_count": 43
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "u_faVn_JS5NV",
"outputId": "e88c8fa6-f154-4ab1-e450-7a59870d8430"
},
"source": [
"# max tokens excluding the special tokens\n",
"tokenizer.max_len_single_sentence "
],
"execution_count": 44,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1022"
]
},
"metadata": {},
"execution_count": 44
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "jki18a4yNyCw",
"outputId": "80fb8260-4cf4-41bc-e81c-14183e46e647"
},
"source": [
"# number of special tokens\n",
"tokenizer.num_special_tokens_to_add() "
],
"execution_count": 45,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"2"
]
},
"metadata": {},
"execution_count": 45
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kubwFM6MX7rR"
},
"source": [
"# Convert file content to sentences"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"id": "Y41E2n-BXjRr",
"outputId": "2910c1ba-dcf1-4df2-fc1e-595f3442453a"
},
"source": [
"# extract the sentences from the document\n",
"import nltk\n",
"nltk.download('punkt')\n",
"sentences = nltk.tokenize.sent_tokenize(FileContent)"
],
"execution_count": 47,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[nltk_data] Downloading package punkt to /root/nltk_data...\n",
"[nltk_data] Package punkt is already up-to-date!\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "S7pHJ31xbJ6O",
"outputId": "f2305363-e5cd-4b3a-9097-2743b3784040"
},
"source": [
"# find the max tokens in the longest sentence\n",
"max([len(tokenizer.tokenize(sentence)) for sentence in sentences])"
],
"execution_count": 48,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"93"
]
},
"metadata": {},
"execution_count": 48
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5amp-uoNezvV"
},
"source": [
"# Create the chunks"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "SO-QPuAoY6BH",
"outputId": "63b2c558-194f-4474-95a4-8d217aaefecb"
},
"source": [
"# initialize\n",
"length = 0\n",
"chunk = \"\"\n",
"chunks = []\n",
"count = -1\n",
"for sentence in sentences:\n",
" count += 1\n",
" combined_length = len(tokenizer.tokenize(sentence)) + length # add the no. of sentence tokens to the length counter\n",
"\n",
" if combined_length <= tokenizer.max_len_single_sentence: # if it doesn't exceed\n",
" chunk += sentence + \" \" # add the sentence to the chunk\n",
" length = combined_length # update the length counter\n",
"\n",
" # if it is the last sentence\n",
" if count == len(sentences) - 1:\n",
" chunks.append(chunk.strip()) # save the chunk\n",
" \n",
" else: \n",
" chunks.append(chunk.strip()) # save the chunk\n",
" \n",
" # reset \n",
" length = 0 \n",
" chunk = \"\"\n",
"\n",
" # take care of the overflow sentence\n",
" chunk += sentence + \" \"\n",
" length = len(tokenizer.tokenize(sentence))\n",
"len(chunks)"
],
"execution_count": 49,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"10"
]
},
"metadata": {},
"execution_count": 49
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b3MjwOU4-vWZ"
},
"source": [
"# Some checks"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "lANWYPQsa7mV",
"outputId": "e6a8426e-9fb9-4ac6-871e-17e3c7742e12"
},
"source": [
"[len(tokenizer.tokenize(c)) for c in chunks]"
],
"execution_count": 53,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1014, 984, 960, 1003, 1020, 988, 996, 998, 1017, 576]"
]
},
"metadata": {},
"execution_count": 53
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "0tQe2jJya_HV",
"outputId": "9ce43d2e-76a1-4d2f-cc93-1edb0a4a7641"
},
"source": [
"[len(tokenizer(c).input_ids) for c in chunks]"
],
"execution_count": 57,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1016, 986, 962, 1005, 1022, 990, 998, 1000, 1019, 578]"
]
},
"metadata": {},
"execution_count": 57
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "el24gv0f-9DJ"
},
"source": [
"## With special tokens added"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "Mq-tLbFtFYHR",
"outputId": "c0306d10-4f97-4b59-9f96-e76e792c0327"
},
"source": [
"sum([len(tokenizer(c).input_ids) for c in chunks])"
],
"execution_count": 50,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"9576"
]
},
"metadata": {},
"execution_count": 50
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"id": "yokzuZmuGAQ_",
"outputId": "a89486b0-8383-4195-ed95-4d5bf8663fb9"
},
"source": [
"len(tokenizer(FileContent).input_ids)"
],
"execution_count": 51,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"Token indices sequence length is longer than the specified maximum sequence length for this model (9561 > 1024). Running this sequence through the model will result in indexing errors\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"9561"
]
},
"metadata": {},
"execution_count": 51
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qe0UKJZr-1Qt"
},
"source": [
"## Without special tokens added"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "u1nynSGcFiSh",
"outputId": "875cd7c4-9a5e-42b2-d166-bfac85c07188"
},
"source": [
"sum([len(tokenizer.tokenize(c)) for c in chunks])"
],
"execution_count": 54,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"9556"
]
},
"metadata": {},
"execution_count": 54
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "JdnNz9D2GFAz",
"outputId": "cde8df98-7ca1-40fa-894d-8c04312b2907"
},
"source": [
"len(tokenizer.tokenize(FileContent))"
],
"execution_count": 55,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"9559"
]
},
"metadata": {},
"execution_count": 55
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZPluCcpKgGTi"
},
"source": [
"# Get the inputs"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "u98_-5RNhkEX",
"outputId": "7d27f4c2-b22c-48ff-e14e-abc40f5ff54e"
},
"source": [
"# inputs to the model\n",
"inputs = [tokenizer(chunk, return_tensors=\"pt\") for chunk in chunks]"
],
"execution_count": 17,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iKQKEdo8eakp"
},
"source": [
"# Output"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 425
},
"id": "REi6Ua8ejZ71",
"outputId": "9a24e0b4-decc-4c59-d726-40679fae693b"
},
"source": [
"for input in inputs:\n",
" output = model.generate(**input)\n",
" print(tokenizer.decode(*output, skip_special_tokens=True))"
],
"execution_count": 18,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <style>\n",
" pre {\n",
" white-space: pre-wrap;\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
" Market research shows that TV remote control has a fancy look and feel, not a functional look or or feel, the number one thing that was found was that television remote control was not functional. Number two was that it be innovative without a adding unnecessary functional bits to it, and third priority is that it has to be user friendly while still having technology.\n",
" Style is number one thing in the in the market of who we're selling to. Innovative design technology's also a must in that it's seen it'd be seen to be uh cutting edge, but ease of use t has to be insured throughout. And then at the end there are vibrant natural colours.\n",
" We need to have something that unifies a lot of the different concepts, and if we think that what we are w our number one marketing motive is the look and feel. We are leaning quite a bit to the side of being low-tech, rubber buttons plastic frame, it's almost like we're reproducing the remote control that's out there. We're gonna need to put in a really good battery so people don't have to charge their r remote control every few days.\n",
" Could we use like a jog dial, like a nice just sort of round, round, somewhere on it where you just roll it? Or why don't we do it like a mouse then? Or maybe just fit it in like down the middle here. Jog dials are much easier than that, the alternative is to go push the button. The buttons should be easy to press, they shouldn't be tiny little little little buttons like mobile phones.\n",
" Andrew, Craig and David, Andrew and Craig take part in a meeting to discuss the product. The final meeting will be held at the end of the meeting. Andrew says he wants to make a decision on the concepts of the product, and then after that we'll have to make some decisions about stuff. Craig says he'd quite like a snowman type shape, a p sort of larger bit sits in your hand, and a bubble at the top for just any other function you need.\n",
" Mm-hmm. I think it's yellow because like the website is yellow and there's a band at the bottom is yellow, so yellow, lemon, you know definitely food for thought there, but keep going and we'll discuss it after. Mm 'kay. Do y do you want to sit in the the line of sight of this um Yeah. I'll clear one of these things for you. Just by moving it yeah. I know it'd be handy, wouldn't it. I mean do y are we thinking something that like s could sit in your hand comfortably, or do you th you'd hold onto comfortably or something quite curvy?\n",
" Or when you like or if you turn it off or something if it can speak if it could actually say the slogan it might be a bit more powerful than just having it written on it somewhere. I d I d any thoughts on that at all? I don't know if there is really, no um I would say no need for a talk-back. Or if we wanted to incorporate an L_C_D_ display, where would we put that on the inside or It's bound to increase the cost of it a lot, I would've thought. I think that would make it very complex. So so no. need for an L.C_DD_ display? I think\n",
" The most input that's needed is basically in the user interface. The rest of the components do have an impact in terms of cost and complexity. The power is basically a factor of that. And the lower components, the power, the logic, the transmitter, and the infrared, affect you in. size of your device. The case can actually be flat or it can be curved, and then the different types of materials that you can use.\n",
" The kinetic one I guess for me is the most interesting one because it's movement and people like to fiddle with their and it's a nice sales gimmick I think. From a marketing gimmick it it it is a technology thing. The less components we use and the simpler the components means you reduce your cost and you increase your profit. Um and the other thing is from a production point of view.\n",
" The case is transparent so it gives it a little bit of a glow, doesn't make it freaky. Or or there might be a light running through it like a mouse. I think incorporating a logo is quite straight forward. There's lots of space for it um Yeah but it's also a marketing and a function.\n"
]
}
]
}
]
}
@itayalm
Copy link

itayalm commented Apr 23, 2022

Thank you very nice :)

@NamraRehman
Copy link

NamraRehman commented Aug 17, 2022

its gives Extractive summarization. and not Abstarctive.
For extractive there are many easy algos which give good accuracy like gensim, luhn etc

@SumanthSrungavarapu
Copy link

its gives Extractive summarization. and not Abstarctive. FOr extractive there are many easy algos which give good accuracy like gensim, luhn etc

yeah I also observed that have you found out why and any other alternatives for abstractive summarization?

@NamraRehman
Copy link

Yes, Google's pegasus model is the state-of-the-art pre-trained model for abstractive text summarization.

@SumanthSrungavarapu
Copy link

Yes, Google's pegasus model is the state-of-the-art pre-trained model for abstractive text summarization.

I tried that too, I tried the pegasus model pretrained on cnn dataset available in huggingface. but its is also giving extractive results. If you have seen abstractive results could you please share that notebook or code.

@NamraRehman
Copy link

NamraRehman commented Feb 22, 2023

Yes, Google's pegasus model is the state-of-the-art pre-trained model for abstractive text summarization.

I tried that too, I tried the pegasus model pretrained on cnn dataset available in huggingface. but its is also giving extractive results. If you have seen abstractive results could you please share that notebook or code.

Yes, i used the pretrained model from the below link , and it output the abstractive summaries.

https://towardsdatascience.com/how-to-perform-abstractive-summarization-with-pegasus-3dd74e48bafb

This link also has the fine tuning code , i fine tuned pegasus on my legal text dataset, but im unable to predict the summaries and evaluate the model. if you find or write the code share it with me as well.

@VincentOracle
Copy link

Yes, the Google's pegasus model is the state-of-the-art pre-trained model for abstractive text summarization.

I tried that too, I tried the pegasus model pretrained on cnn dataset available in huggingface. but its is also giving extractive results. If you have seen abstractive results could you please share that notebook or code.

@VincentOracle
Copy link

Yes, the Google's pegasus model is the state-of-the-art pre-trained model for abstractive text summarization.

I tried that too, I tried the pegasus model pretrained on cnn dataset available in huggingface. but its is also giving extractive results. If you have seen abstractive results could you please share that notebook or code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment