Skip to content

Instantly share code, notes, and snippets.

@kowey
Last active August 29, 2015 14:12
Show Gist options
  • Save kowey/e2370c5b8cbb2f2e01d1 to your computer and use it in GitHub Desktop.
Save kowey/e2370c5b8cbb2f2e01d1 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:cfd825ef318eb3d896cf218060d459df23b04a94c05ea842ebee858bae869d2b"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's read a small CSV file into Orange:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from __future__ import print_function\n",
"import Orange\n",
"\n",
"FILE_A = 'tiny.attach.tab'\n",
"table_a = Orange.data.Table(FILE_A)\n",
"\n",
"print(table_a.domain)\n",
"for inst in table_a:\n",
" print(inst)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[id, foo, CLASS]\n",
"['a1', 'x', 'False']\n",
"['a2', 'y', 'True']\n",
"['a3', 'z', 'False']\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How nice. We can also do things like ask for the range of possible values associated with a variable"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"class_values = table_a.domain['CLASS']\n",
"list(class_values)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"[<orange.Value 'CLASS'='False'>, <orange.Value 'CLASS'='True'>]"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Splendid! So in the A table domain, the 'CLASS' variable is associated with the values 'True' and 'False'.\n",
"\n",
"So shall we look at a second table?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"FILE_B = 'tiny.relate.tab'\n",
"table_b = Orange.data.Table(FILE_B)\n",
"\n",
"print(table_b.domain)\n",
"for inst in table_b:\n",
" print(inst)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[id, bar, CLASS]\n",
"['a1', 'b', 'Narration']\n",
"['a2', 'd', 'Elaboration']\n",
"['a3', 'e', 'Background']\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So far so good, now what do you think should be the values associated with variable 'CLASS' in table B?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"list(table_b.domain['CLASS'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"[<orange.Value 'CLASS'='False'>,\n",
" <orange.Value 'CLASS'='True'>,\n",
" <orange.Value 'CLASS'='Background'>,\n",
" <orange.Value 'CLASS'='Elaboration'>,\n",
" <orange.Value 'CLASS'='Narration'>]"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait, what? Where did 'False' and 'True' come from?\n",
"Hang on, let's check back in table A"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"list(table_a.domain['CLASS'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"[<orange.Value 'CLASS'='False'>,\n",
" <orange.Value 'CLASS'='True'>,\n",
" <orange.Value 'CLASS'='Background'>,\n",
" <orange.Value 'CLASS'='Elaboration'>,\n",
" <orange.Value 'CLASS'='Narration'>]"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Huh. Did you know that variable descriptors in Orange were global and of course mutable? I didn't but [looking more closely at the manual](http://docs.orange.biolab.si/reference/rst/Orange.feature.descriptor.html#Orange.feature.Descriptor), it says quite plainly:\n",
"\n",
"> Orange considers two variables (e.g. in two different data tables) the same if they have the same descriptor. It is allowed - but not recommended - to have different descriptors with the same name.\n",
"\n",
"This matters particularly if we are use two different tables that have a local 'CLASS' feature, particularly if we're doing things like looking up the probability for 'True' by its presumed index `1` (`\u0ca0_\u0ca0`)"
]
}
],
"metadata": {}
}
]
}
CLASS id foo
d d d
class
False a1 x
True a2 y
False a3 z
CLASS id bar
d d d
class
Narration a1 b
Elaboration a2 d
Background a3 e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment