Guillaume Maze gmaze

## obidam_environment.yml
name: obidam
channels: !!python/tuple
- !!python/unicode
  'anaconda-fusion'
- !!python/unicode
  'conda-forge'
- !!python/unicode
  'defaults'
dependencies:
- _license=1.1=py27_1

## multiprocessing_eg_02datarmor.py
#!/usr/bin/env python
#
# This example shows how to launch multiple processes in parallel on a single machine with multiple cpus
# There is no communication between processes and no data are gathered in the end.
# Each process executes the same function but with different arguments.
# The script wait for all sub-processes to be done, then execute another task.
#
# How to run on your computer:
# 	python multiprocessing_eg_02.py
#

## numpy_arange_vs_linspace_trunc.py
import numpy as np

eps = np.finfo(np.float64).eps
print "This is epsilon:", eps

print "\nnumpy.arange"
dX = 0.1
X = np.arange(34.,38.,dX)
udX = np.unique(np.diff(X))
print "Unique dX:", udX

## PCA.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gmaze
                / PCA.ipynb
            
            
              Created
              January 8, 2018 22:19
                — forked from TomAugspurger/PCA.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## obidam_storage.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                gmaze
                / obidam_storage.md
            
            
              Last active
              May 17, 2018 09:27
            
              
                OBIDAM: dataset fast access issue
              
          
    To run data mining algorithms on ocean's large datasets, we need to optimise access to datasets with possibly up to 6-dimensions.
A generalised 6-dimensional dataset is [X,Y,Z,T,V,E] where:

X,Y,Z,T are the space/time dimensions,
V is the variable dimension (eg: temperature, salinity, zonal velocity) and,
E the ensemble dimensions (list of realisations or members).

Running data mining algorithms on this dataset mostly implies to re-arrange the 6 dimensions into 2-dimensional arrays with, following the statistics vocabulary "sampling" vs "features" dimensions. The sampling dimension is along rows, the features along columns. A large dataset can have billions of rows and hundreds of columns.
Eg:

  
## xtsatools.py
#~/usr/bin/env python
#
# Useful functions for xarray time series analysis
# (c) G. Maze, Ifremer
#

import numpy as np
import xarray as xr
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

## passing_options.py
def base_fct(**kwargs):
    defaults = {'sharey':'row', 'dpi':80, 'facecolor':'w', 'edgecolor':'k'}
    options = {**defaults, **kwargs}
    return options

def fct(**kwargs):
    defaults = {'sharey':'cols'}
    return base_fct(**{**defaults, **kwargs})

print("Default base options:\n", base_fct())

## GenerateMovie.sh
#!/usr/bin/env bash
#
# Gerenate mp4 videos from a collection of image files
#
# Video files are saved into ./videos
#

# Folder with image files:
src="/home/datawork-lops-oh/somovar/WP1/data/dashboard/img/monthly" # This is an example

## Parallel_images.py
#!/usr/bin/env python
# coding: utf-8
#
# $ time ./Parallel_images.py
# Use 8 processes
# 107.249u 2.444s 0:17.10 641.4%	0+0k 0+0io 1056pf+0w
#

import os
import numpy as np

## Compare_time_response_erddap.py
#!/bin/env python
# -*coding: UTF-8 -*-

import requests
import time

# Request full data:
t0 = time.time()
url = 'http://www.ifremer.fr/erddap/tabledap/ArgoFloats.csv?data_mode,latitude,longitude,position_qc,time,time_qc,direction,platform_number,cycle_number,pres,temp,psal,pres_qc,temp_qc,psal_qc,pres_adjusted,temp_adjusted,psal_adjusted,pres_adjusted_qc,temp_adjusted_qc,psal_adjusted_qc,pres_adjusted_error,temp_adjusted_error,psal_adjusted_error&platform_number=~"5900446"&distinct()&orderBy("time,pres")'
requests.get(url)
	name: obidam
	channels: !!python/tuple
	- !!python/unicode
	'anaconda-fusion'
	- !!python/unicode
	'conda-forge'
	- !!python/unicode
	'defaults'
	dependencies:
	- _license=1.1=py27_1
	#!/usr/bin/env python
	#
	# This example shows how to launch multiple processes in parallel on a single machine with multiple cpus
	# There is no communication between processes and no data are gathered in the end.
	# Each process executes the same function but with different arguments.
	# The script wait for all sub-processes to be done, then execute another task.
	#
	# How to run on your computer:
	# python multiprocessing_eg_02.py
	#
	import numpy as np

	eps = np.finfo(np.float64).eps
	print "This is epsilon:", eps

	print "\nnumpy.arange"
	dX = 0.1
	X = np.arange(34.,38.,dX)
	udX = np.unique(np.diff(X))
	print "Unique dX:", udX
	#~/usr/bin/env python
	#
	# Useful functions for xarray time series analysis
	# (c) G. Maze, Ifremer
	#

	import numpy as np
	import xarray as xr
	import pandas as pd
	from statsmodels.tsa.seasonal import seasonal_decompose
	def base_fct(**kwargs):
	defaults = {'sharey':'row', 'dpi':80, 'facecolor':'w', 'edgecolor':'k'}
	options = {defaults, kwargs}
	return options

	def fct(**kwargs):
	defaults = {'sharey':'cols'}
	return base_fct({defaults, **kwargs})

	print("Default base options:\n", base_fct())
	#!/usr/bin/env bash
	#
	# Gerenate mp4 videos from a collection of image files
	#
	# Video files are saved into ./videos
	#

	# Folder with image files:
	src="/home/datawork-lops-oh/somovar/WP1/data/dashboard/img/monthly" # This is an example
	#!/usr/bin/env python
	# coding: utf-8
	#
	# $ time ./Parallel_images.py
	# Use 8 processes
	# 107.249u 2.444s 0:17.10 641.4% 0+0k 0+0io 1056pf+0w
	#

	import os
	import numpy as np
	#!/bin/env python
	# -coding: UTF-8 --

	import requests
	import time

	# Request full data:
	t0 = time.time()
	url = 'http://www.ifremer.fr/erddap/tabledap/ArgoFloats.csv?data_mode,latitude,longitude,position_qc,time,time_qc,direction,platform_number,cycle_number,pres,temp,psal,pres_qc,temp_qc,psal_qc,pres_adjusted,temp_adjusted,psal_adjusted,pres_adjusted_qc,temp_adjusted_qc,psal_adjusted_qc,pres_adjusted_error,temp_adjusted_error,psal_adjusted_error&platform_number=~"5900446"&distinct()&orderBy("time,pres")'
	requests.get(url)