Skip to content

Instantly share code, notes, and snippets.

@brylie
Created December 28, 2023 07:32
Show Gist options
  • Save brylie/ac6d0bae6b058addea1e12a18cdcddd2 to your computer and use it in GitHub Desktop.
Save brylie/ac6d0bae6b058addea1e12a18cdcddd2 to your computer and use it in GitHub Desktop.
Code with Brylie - e41
(...) Hello and welcome to an open source live code hangouts. I don't know what episode this is but we're gonna work on a task today. I should probably just drop the episode thing but I was having such bandwidth problems last time that the episode got broken up into like five or six parts. We are gonna work on making fake data.(...) I think we're like episode 42.(...) I've got my ectoplasm.(...) Basically I need to have some data in this project so that we can develop new features and show how the project works without manually creating a bunch of demo data.(...) So I'm gonna create a single command that sort of builds out a small system with homes, residents, and some activities just to get the feel for it. Try to do that within one hour. Then I'll build on that learning in a subsequent task probably for more elaborate dummy data.(...) So you can see our source code is on github.com slash Jerry life slash caregiving and I'm working on following issue I just realized.(...) There you go. I had turned off the stream elements overlay. I'll have to work on the overlays a little bit more. We want the chat overlay but not the emoji bomb one. I have to disable that one because that really slows down my computer.(...) Alright so let's start a branch. I've got the code locally here.
(...)
Create the initial make fake data command. Now this is a common command. It's gonna make fake data for all of the apps or most of them. Yeah I mean almost all of the apps here have some kind of model. Let me just reload this so I don't have the thing Django rough have some good support improvements for the Django and rough plugins here. Excellent. Alright so we have a common app and inside of each Django app there's a common structure and we can create custom command line commands called management commands. So I can create a new file and it's in the management command and then the command is just whatever the command is you want to type in the name of the file is the command you want to type make fake data.py.(...) So management commands make fake data and there's a couple examples here where I've got other management commands.(...) Here's one make fake activities.(...) So we'll just take a quick oh that one has been defined but it was an O because I refactored the activities.(...) We'll take a quick look at the structure but essentially it's a class that inherits from base command and we just call it command by convention and I'll need a bunch of factories the factory so I can create all sorts of data but we'll start here with the homes.(...) So I'll bring in my imports and we need a couple of these we got resident and residency.(...) The difference is that resident is like a person and residency is when a person lives at a home for a particular time frame move in and move out.(...) Believe it or not that probably sounds obvious but it wasn't obvious when we first started creating this prototype and that people move in and move out so you know as you build a project you're making a model of the world and certain details either aren't relevant or aren't obvious and when you start out and that changes over time.(...) Okay so the structure is this we have a command class we want some help text and we are going to define I'm probably not gonna need any arguments on this one it's just gonna set up the app so just the handle function anyway let's grab this whole thing real quick and you can see that copilot was already going to suggest some things for me.(...) For now it's just gonna create fake homes residents and residencies and copilot might let's just see what copilot suggests if I go there for a minute. Resident number verification home factory create resident create residency create home and resident so this is cool that would create five homes with each home having one resident.(...) I'm going to create some configuration variables here so and
(...)
these could become arguments number of homes 10 number of residents per home five is good so I'm not going to make it too complicated where we would have multiple residencies per resident at this point.(...) I just realized I just realized
(...)
this should be a range and 15 let's see let's see 7 to 60 it's big the reason is we want to have a distribution of activity levels.(...) I'm going to give copilot a little bit more context to show down the road where we're heading we're heading to the metrics activity resident activity so that way copilot can kind of inspect all of this
(...)
change just see if I need any import for that. I'm going to select a random number in the range 7 to 60 random choice so we need to import random
(...)
We're just starting the home level right now okay so we've got all the context there let's see what copilot generates now so create homes.(...) So it
(...)
Hmm craziness so we get 10 okay this is the problem though no maybe not it knows that we can create 10 residents and then it knows that we need to link those residents to resident factories wow and then it knows we're gonna get a random number of resident activities but here's where it gets a bit tricky this is not correct
(...)
and if I check this is really cool I didn't know about the syntax create batch and in the bulk create this is great if I check this I would like it to be within the last 30 days and by default
(...)
the activity day is just it's just a date field so we're not passing anything in that though isn't the factory that was the model okay so it's using the bulk create from Django here which is cool
(...)
but I think if I don't use such syntactic sugar here if I use this regulator it'll be a little easier to follow and maybe less error prone I'm not sure that this will even work the bulk create can take an iterator an iterable T so this would create an iterable of resident activities the problem is this is too fancy I want to create a random date within the last 30 days here and I can't really do that in this this would this would generate an
(...)
activity
(...)
equals now this is a good point it's looking at the residency start date and making a meaningful choice based on that that's a good consideration we want the activities to be within the residency time frame
(...)
but for the time being
(...)
I'll just do create activity within the last 30 days so now today today
(...)
equals the date today equals today minus
(...)
random integer
(...)
random type
(...)
so let's look side by side real quick it helps my mind where is my mind resident activity date
(...)
residency home
(...)
activity type caregiver role this is interesting let's see what this does I don't know if I need the zero with one I think I should just return does it return a random list
(...)
I didn't want to run this direction
(...)
actually this is from metrics miles from from
(...)
import random what does that do
(...)
activity type(...) okay so it's returning a tuple and I need the zero that's right
(...)
and then activity minutes random integer between
(...)
30 and 120
(...)
I shouldn't really use these
(...)
magic numbers here
(...)
and here I've got a range
(...)
0 to 30 and then so this will be that and then a random activity minutes range
(...)
30 to 120 there we go yeah that way we have explicit names for things and I can configure the whole thing mostly up here all the parameters right there at the top and these could be arguments to the um
(...)
command but I'm not sure we need to really do that let's see what's happening here okay I have a syntax error so I need to do that
(...)
something like that let me see here real quick
(...)
okay last n days I stole my caps lock on there it goes so we're creating resident activity object we were activity minutes caregiver roll
(...)
uh yeah
(...)
random choice caregiver roll choices(...) and then group activity id more or less we'll just generate that this is nullable
(...)
equals now to uuid field
(...)
which is noble so let's leave that alone
(...)
and we'll consider it later
(...)
so I think we've gotten down to the level of detail I was hoping for create help residents residencies and resident activities so the code is fairly readable
(...)
so I'll just put these here so I'm not inlining so much
(...)
so and then activity minutes also so I do the logic and then I use the value activity minutes
(...)
type activity minutes(...) there we go
(...)
so this is just there's nothing to think about here I just see there direct mapping and I can check the logic up here it just helps me to organize my thinking a bit more
(...)
all right so I think we can test this out
(...)
that this is a range and it's something that so this is kind of a strange
(...)
I could count it
(...)
so so let's see
(...)
equals zero
(...)
well that might I know where I work because I'm going to do that residence created and this is not correct
(...)
since I'm choosing
(...)
a number from a range so for each of these
(...)
so
(...)
plus equals one
(...)
residencies created plus equals one
(...)
activity created I don't know if I have to save that I don't think so plus equals one
(...)
so so there we go the reason I'm counting both of these is we may decide to have a differing a number of residencies and residents there would always be
(...)
pretty much more equal or more residencies than residents
(...)
let's try it out so if I reset the database
(...)
just by deleting it we'll migrate into all the schema changes
(...)
drink a little bit of ectoplasm
(...)
now we'll try the command and the command is just the same as the python file name without the extension can't open the python manage but it's a management command
(...)
all right
(...)
I see what I did wrong here so this is essentially a copy and paste code(...) oops
(...)
or in other words it's l m l l m generated data I didn't review it closely enough so what I can do is
(...)
if we want I could put tqdm in the project and we can see the progress of this but I can hear my cpu working um tqdm could be a good addition it's not too many dependencies I think
(...)
and then we just take an iterable and we're doing something with it and it shows you what's going on looks like there's some security improvements on(...) pyp and maybe potentially some abuse going on
(...)
malicious users and malicious project in the past week has outpaced our ability to respond
(...)
crazy happy holidays man
(...)
so still working on it a little bit of progress would be good
(...)
that's an enhancement I can later
(...)
yeah
(...)
so I think the bulk of it is coming here I probably have too big of a range of activities when we get into the we're multiplying all these basically
(...)
should have started smaller that's another reminder when working with data working small batches at first you know take a subset of the data or emulate a smaller uh you know domain then you would otherwise be working with until you've got your code worked out then apply it to the rest of the data you know when writing database queries you can you know select a number of rows and try your aggregation on those for example or in pandas you can subset rows in a data frame but it's these mistakes we make over and over it's easy mistake to make
(...)
but you can see we're getting data written to the database we're not having any errors so
(...)
the code is fairly readable it seems like it's doing what I'm expecting it to do
(...)
we just don't have any progress which is natural I don't know if there's a Django
(...)
TQDM progress bar built-in
(...)
Django TQDM I don't know if I need a really integration
(...)
I just added to self and it didn't quite make sense I may import TQDM and use it here on this line is not that
(...)
much different I'm sure not sure what the value is of this in addition to adding another dependency or like here we go created 10 fake homes 10 fake residencies wow 70 fake residencies
(...)
so there's something wrong there and then 2000 activities so what happened here for number um
(...)
so
(...)
my printing the wrong thing
(...)
yeah I don't know why we have so many of those maybe there's something in my residency factory get up create
(...)
crazy let's run the server let's create a super user
(...)
super secret password
(...)
run the server
(...)
accessing go to the homes page first just to check hey it's looking good though
(...)
I'm just a little bit concerned about the number of residencies I don't know what happened there
(...)
but you see the distribution of activity levels is looking you know random and I think that's kind of natural there's a lot of entropy and reality and in these caregiving systems but we have the full range we have some inactive we have some low some medium moderate and some high very good so at least in one case you know we have the full distribution here so let's check that case out
(...)
we have all the residents very good we have the activity types this is great and then we can do the same with work eventually too work is different than activities activities are what the residents do for fulfillment the work is what the professionals do to make sure the um um things are you know they're clean and operating efficiently so there's like doing medication sanitation work we want to kind of get a sense of the overall wellness of the(...) home of the institution to make sure the staff are not overburdened that they're doing well and that the residents ultimately are benefiting and leading fulfilling lives in a safe and clean environment so yeah so pretty natural distribution lots of types of activities roughly proportional
(...)
but random and then if I look at it a given resident I have to log in now that's actually that shouldn't be the case
(...)
it should be the opposite I should have to log in to view the home but not the resident we have this
(...)
idea of security through obscurity that we want the family to be able to view the resident anonymously and we're providing only anonymous non-personally identifiable information here(...) potentially just the first just the first initial and last initial but in any case and then they can see the activity just by knowing the url as you saw when I'm viewing the homes I didn't have to be logged in here so that I have to fix this is a prototype right now we're not this isn't deployed in any system so at this point I can commit this and I mean it worked it just generated too many residencies if we go to the admin section(...) I can verify that we have residencies
(...)
71 one
(...)
residence 77
(...)
so that's actually correct so perhaps I mean they should they should match
(...)
a big concern about that 71 residency 77 residents
(...)
all right so the number of residents is random choice
(...)
uh okay
(...)
here's the problem my counter
(...)
I don't need a counter here
(...)
because I'm choosing that number all right
(...)
so
(...)
over consistency I could do this
(...)
it's the length of the residencies
(...)
so
(...)
oh and
(...)
I'm an iterator here
(...)
so if I do want to count might as well count the output although I could just plus equals that assuming that this works correctly six of one half dozen another and then and then this is just individual one at a time here
(...)
now we should be good number of activities per resident
(...)
and this is one at a time here
(...)
all right so let's make let's do this test I want to get these counts correct delete the database again make my tolerances smaller make the amount we generate lower
(...)
number of residents per home two to three number of activities per resident two to three activity days ago is fine activity minutes number of homes three um
(...)
migrate the database structure
(...)
then we will run our command make fake data and here we go and we have matching residents and residencies and then the fake activities
(...)
it's looking good all right so let's just say five homes residents per home can be three to five number of activities per resident should be seven through 30(...) 30
(...)
this is the benefit of having the configuration all live up here so I don't have to scroll through here and grok all of the code and change the variables I can just think at the high level of abstraction without the implementation details now what I will do is just add a(...) it's the same okay
(...)
doc string there and we'll run it one more time delete call it in the morning
(...)
migrate the database changes
(...)
and make fake data takes a bit longer now
(...)
so
(...)
So, Residencies and Residencies still match. We have five vacant homes and 286 residency activities.
(...)
This will be really helpful for demoing the system for development.
(...)
Okay, so now you can see though with the random choice, the distribution of... That's interesting too. I'll have to truncate that. Distribution of activity levels is not as good.
(...)
There's not as big of a chance.
(...)
So I'll have to tune it, like tweak it a bit.
(...)
Either increasing the size, which takes longer,(...) or using a probability distribution. This is just a uniform distribution, this random.
(...)
This is a uniform distribution,(...) but which may be good.
(...)
We could put more intelligence in there that each...
(...)
We should have run one resident per activity range.
(...)
So that's an enhancement. I'll have to wrap up this session.(...) We got the basic working data though, generations. That's perfect achievement. Low activity threshold. I think the simplest thing I can do now is just increase the number of activities per resident.
(...)
Or shorten the range of days ago.
(...)
We can see they're mostly low and inactive.(...) If we want some people to be higher activity,
(...)
the activity threshold is based on the last seven days.
(...)
For the first iteration, I could just focus on seven days.
(...)
And then the number of activities per resident should be in the range of zero and ten, eleven.
(...)
I think by tightening that up, it'll give us a better distribution. Let me just try that one more time.
(...)
Then I'll work out the 30-day thing a little bit later.
(...)
Because ultimately we do want to show that this thing records activities over 30,(...) 300 days.(...) The original prototype was running for, I think, two and a half years, maybe three years. We got a lot of data there.(...) We at least demonstrated an uptake in the tool. There was an upward trend in activity levels, but we're extrapolating that that would be reflective of increased engagement.
(...)
But we don't have any concrete causal data to show that this actually had the impact we were hoping for.
(...)
But just strong intuition that when you bring information in and make caregiving visible, that it does have a positive impact on the community.
(...)
In particular, people who were otherwise sort of invisible accidentally, some of the residents were being overlooked.
(...)
Then we'll run a server.
(...)
Naturally, the family was concerned.
(...)
Okay, we have a little bit better distribution here.
(...)
I think if I just pick more residents per home,(...) like five to ten,(...) which is sort of more what we're dealing with,
(...)
we would have a more full distribution. I'll run it one more time. It's 9 a.m. I gotta go head out.
(...)
Migrate.
(...)
Make big data. It takes a bit.
(...)
Interestingly, okay, Bonnie is inactive, so of course Bonnie is not going to show up here. That could be an improvement I'd make to the chart.
(...)
In the long run,(...) we'll be looking at 30 days of data, so there will be a likelihood of everybody having done an activity in the last 30 days.
(...)
More residents didn't hit the high threshold, so we'll increase the range here strangely.
(...)
High, oh, maybe I'm just misremembering that high should start with 10,(...) which I can actually pull this out. I just recalled I can pull this out of our constant, our global constant, but it's not a big deal one more time. Just one more time.
(...)
It's not a super tight deadline in 9 a.m. Migrate.
(...)
I'm excited that this is cool.
(...)
I'd like to sort these alphabetically, so now that I have fake data, then I can start seeing the system operating as it would in a more natural context. We can find there's some little improvements we want to make.
(...)
These labels should show up. So there we go. Now we're cooking. This is exciting. This is going to open up a lot of possibilities for development for new contributors to come in and get a nice environment set up to see the system, how it works, and then work on their small task. Hopefully our tasks will be relatively small in scope generally.
(...)
And run the server as well as demonstrating this. Okay,(...) there we go. So now we have a better distribution. High. And I might use a different color to distinguish the high from the low, but essentially we believe we should target the green area and not be too high or too low. So these are kind of warning zones and the danger zone here of inactive. So here's a home. We've got the full distribution. High. Sort of by name.
(...)
Very cool. All right, so I'll wrap it up. I'll commit these off.
(...)
Let's just say add.
(...)
Initial make big data command.
(...)
And it's going to lint that for me.
(...)
We'll publish the branch. So this has been another open source live code hangout.
(...)
If you'd like to follow along with this project, we're at github.com slash Jerry live slash caregiving.
(...)
Fully open source.
(...)
You can see the pull request I just opened here.
(...)
For just these specific changes. Okay, well, I hope you're doing well and have a great day.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment