VOOZH about

URL: https://dzone.com/users/914595/PyJoey.html

โ‡ฑ Giuseppe Vettigli - DZone Member


Giuseppe Vettigli

Developer at Research National Council

Napoli, IT

Joined Jul 2011

About

Giuseppe Vettigli works at the Cybernetics Institute of the Italian National Reasearch Council. He is mainly focused on scientific software design and development. His main interests are in Artificial Intelligence, Data Mining and Multimedia applications. He is a Linux user and his favorite programming languages are Java and Python. You can check his blog about Python programming or follow him on Twitter.

Stats

Reputation: 367
Pageviews: 502.8K
Articles: 16
Comments: 0

Articles

Exporting Decision Trees in Textual Format With sklearn
In this post, we explore how to make decision trees using Python and a open data set.
June 12, 2019
ยท 13,498 Views ยท 1 Like
Plotting a Calendar in Matplotlib
We quickly go through the code you'll need to get started working with this popular and open source Python library and plotting your data!
April 15, 2019
ยท 12,421 Views ยท 3 Likes
Ravel and Unravel With NumPy
We take a quick look at how to work with NumPy by exploring the ravel and unravel methods that come built into this popular Python framework.
April 2, 2019
ยท 13,548 Views ยท 3 Likes
A Visual Introduction to Gap Statistics
A data expert shows us how to improve the findings of K-Means clustering in Python by employing Gap Statistics. Read on to get started!
January 24, 2019
ยท 9,980 Views ยท 1 Like
Spotting Outliers With Isolation Forest Using sklearn
In this post we take a look at how to detect outliers in your data using the isolation forest algorithm. Read on for the details!
October 10, 2017
ยท 29,293 Views ยท 2 Likes
Dates in Pandas Cheatsheet
Check out a large collection of not-so-sloppy snippets for doing scientific computing and data visualization in Python with pandas.
July 14, 2017
ยท 12,344 Views ยท 3 Likes
Creating a Heat Map of Ratios With Seaborn
See how to create a heat map containing the ratio of males to females in the population by age for 1970 to 2015 with Seaborn.
June 17, 2017
ยท 8,108 Views ยท 2 Likes
Andrews Curves
Andrews curves are a method for visualizing multidimensional data by mapping each observation onto a function. This function is defined as It has been shown the Andrews curves are able to preserve means, distance (up to a constant) and variances. Which means that Andrews curves that are represented by functions close together suggest that the corresponding data points will also be close together. Now, we will demonstrate the effectiveness of the Andrew curves on the iris dataset (which we already used here). Let's create a function to compute the values of the functions give a single sample: import numpy as np def andrew_curve4(x,theta):# iris has 4 four dimensions base_functions =[lambda x : x[0]/np.sqrt(2.),lambda x : x[1]*np.sin(theta),lambda x : x[2]*np.cos(theta),lambda x : x[3]*np.sin(2.*theta)] curve = np.zeros(len(theta))for f in base_functions: curve = curve + f(x)return curve At this point we can load the dataset and plot the curves for a subset of samples: samples = np.loadtxt('iris.csv', usecols=[0,1,2,3], delimiter=',')#samples = samples - np.mean(samples)#samples = samples / np.std(samples) classes = np.loadtxt('iris.csv', usecols=[4], delimiter=',',dtype=np.str) theta = np.linspace(-np.pi,np.pi,100)import pylab as pl for s in samples[:20]:# setosa pl.plot(theta, andrew_curve4(s,theta),'r')for s in samples[50:70]:# versicolor pl.plot(theta, andrew_curve4(s,theta),'b')for s in samples[100:120]:# virginica pl.plot(theta, andrew_curve4(s,theta),'g') pl.xlim(-np.pi,np.pi) pl.show() In the plot above, the each color used represents a class and we can easily note that the lines that represent samples from the same class have similar curves.
January 29, 2015
ยท 14,022 Views ยท 6 Likes
Quick HDF5 with Pandas
HDF5 is a format designed to store large numerical arrays of homogenous type. It cames particularly handy when you need to organize your data models in a hierarchical fashion and you also need a fast way to retrieve the data. Pandas implements a quick and intuitive interface for this format and in this post will shortly introduce how it works. We can create a HDF5 file using the HDFStore class provided by Pandas: import numpy as np from pandas importHDFStore,DataFrame# create (or open) an hdf5 file and opens in append mode hdf =HDFStore('storage.h5') Now we can store a dataset into the file we just created: df =DataFrame(np.random.rand(5,3), columns=('A','B','C'))# put the dataset in the storage hdf.put('d1', df, format='table', data_columns=True) The structure used to represent the hdf file in Python is a dictionary and we can access to our data using the name of the dataset as key: print hdf['d1'].shape (5, 3) The data in the storage can be manipulated. For example, we can append new data to the dataset we just created: hdf.append('d1',DataFrame(np.random.rand(5,3), columns=('A','B','C')), format='table', data_columns=True) hdf.close()# closes the file There are many ways to open a hdf5 storage, we could use again the constructor of the class HDFStorage, but the function read_hdf makes us also able to query the data: from pandas import read_hdf # this query selects the columns A and B# where the values of A is greather than 0.5 hdf = read_hdf('storage.h5','d1',where=['A>.5'], columns=['A','B']) At this point, we have a storage which contains a single dataset. The structure of the storage can be organized using groups. In the following example we add three different datasets to the hdf5 file, two in the same group and another one in a different one: hdf =HDFStore('storage.h5') hdf.put('tables/t1',DataFrame(np.random.rand(20,5))) hdf.put('tables/t2',DataFrame(np.random.rand(10,3))) hdf.put('new_tables/t1',DataFrame(np.random.rand(15,2))) Our hdf5 storage now looks like this: print hdf File path: storage.h5 /d1 frame_table (typ->appendable,nrows->10,ncols->3,indexers->[index],dc->[A,B,C]) /new_tables/t1 frame (shape->[15,2]) /tables/t1 frame (shape->[20,5]) /tables/t2 frame (shape->[10,3]) On the left we can see the hierarchy of the groups added to the storage, in the middle we have the type of dataset and on the right there is the list of attributes attached to the dataset. Attributes are pieces of metadata you can stick on objects in the file and the attributes we see here are automatically created by Pandas in order to describe the information required to recover the data from the hdf5 storage system.
August 22, 2014
ยท 82,776 Views
Linear Regression Using Numpy
A few posts ago, we saw how to use the function numpy.linalg.lstsq(...) to solve an over-determined system. This time, we'll use it to estimate the parameters of a regression line. A linear regression line is of the form w1x+w2=y and it is the line that minimizes the sum of the squares of the distance from each data point to the line. So, given n pairs of data (xi, yi), the parameters that we are looking for are w1 and w2 which minimize the error and we can compute the parameter vector w = (w1 , w2)T as the least-squares solution of the following over-determined system Let's use numpy to compute the regression line: from numpy import arange,array,ones,random,linalg from pylab import plot,show xi = arange(0,9) A = array([ xi, ones(9)]) # linearly generated sequence y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24] w = linalg.lstsq(A.T,y)[0] # obtaining the parameters # plotting the line line = w[0]*xi+w[1] # regression line plot(xi,line,'r-',xi,y,'o') show() We can see the result in the plot below. You can find more about data fitting using numpy in the following posts: Polynomial curve fitting Curve fitting using fmin
March 26, 2012
ยท 13,978 Views
Computing a disparity map in OpenCV
A disparity map contains information related to the distance of the objects of a scene from a viewpoint. In this example we will see how to compute a disparity map from a stereo pair and how to use the map to cut the objects far from the cameras. The stereo pair is represented by two input images, these images are taken with two cameras separated by a distance and the disparity map is derived from the offset of the objects between them. There are various algorithm to compute a disparity map, the one implemented in OpenCV is the graph cut algorithm. To use it we have to call the function CreateStereoGCState() to initialize the data structure needed by the algorithm and use the function FindStereoCorrespondenceGC() to get the disparity map. Let's see the code: def cut(disparity, image, threshold): for i in range(0, image.height): for j in range(0, image.width): # keep closer object if cv.GetReal2D(disparity,i,j) > threshold: cv.Set2D(disparity,i,j,cv.Get2D(image,i,j)) # loading the stereo pair left = cv.LoadImage('scene_l.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) right = cv.LoadImage('scene_r.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) disparity_left = cv.CreateMat(left.height, left.width, cv.CV_16S) disparity_right = cv.CreateMat(left.height, left.width, cv.CV_16S) # data structure initialization state = cv.CreateStereoGCState(16,2) # running the graph-cut algorithm cv.FindStereoCorrespondenceGC(left,right, disparity_left,disparity_right,state) disp_left_visual = cv.CreateMat(left.height, left.width, cv.CV_8U) cv.ConvertScale( disparity_left, disp_left_visual, -20 ); cv.Save( "disparity.pgm", disp_left_visual ); # save the map # cutting the object farthest of a threshold (120) cut(disp_left_visual,left,120) cv.NamedWindow('Disparity map', cv.CV_WINDOW_AUTOSIZE) cv.ShowImage('Disparity map', disp_left_visual) cv.WaitKey() These are the two input image I used to test the program (respectively left and right): Result using threshold = 100 Result using threshold = 120 Result using threshold = 180 Source: http://glowingpython.blogspot.com/2011/11/computing-disparity-map-in-opencv.html
February 21, 2012
ยท 26,118 Views
Monte Carlo Estimate for Pi with NumPy
In this post we will use a Monte Carlo method to approximate pi. The idea behind the method that we are going to see is the following: Draw the unit square and the unit circle. Consider only the part of the circle inside the square and pick uniformly a large number of points at random over the square. Now, the unit circle has pi/4 the area of the square. So, it should be apparent that of the total number of points that hit within the square, the number of points that hit the circle quadrant is proportional to the area of that part. This gives a way to approximate pi/4 as the ratio between the number of points inside circle and the total number of points and multiplying it by 4 we have pi. Let's see the python script that implements the method discussed above using the numpy's indexing facilities: from pylab import plot,show,axis from numpy import random,sqrt,pi # scattering n points over the unit square n = 1000000 p = random.rand(n,2) # counting the points inside the unit circle idx = sqrt(p[:,0]**2+p[:,1]**2) < 1 plot(p[idx,0],p[idx,1],'b.') # point inside plot(p[idx==False,0],p[idx==False,1],'r.') # point outside axis([-0.1,1.1,-0.1,1.1]) show() # estimation of pi print '%0.16f' % (sum(idx).astype('double')/n*4),'result' print '%0.16f' % pi,'real pi' The program will print the pi approximation on the standard out: 3.1457199999999998 result 3.1415926535897931 real pi and will show a graph with the generated points: Note that the lines of code used to estimate pi are just 3! Source: http://glowingpython.blogspot.com/2012/01/monte-carlo-estimate-for-pi-with-numpy.html
January 25, 2012
ยท 10,582 Views
Face and Eye Detection in OpenCV
The goal of object detection is to find an object of a pre-defined class in an image. In this post we will see how to use the Haar Classifier implemented in OpenCV in order to detect faces and eyes in a single image. (Note: this article is part of a series (,2) on object detection with OpenCV in Python. --Ed.) We are going to use two trained classifiers stored in two XML files: haarcascade_frontalface_default.xml - that you can find in the directory /data/haarcascades/ of your OpenCV installation haarcascade_eye.xml - that you can download from this website. The first one is able to detect faces and the second one eyes. To use a trained classifier stored in a XML file we need to load it into memory using the function cv.Load() and call the function cv.HaarDetectObjects() to detect the objects. Let's see the snippet: imcolor = cv.LoadImage('detectionimg.jpg') # input image # loading the classifiers haarFace = cv.Load('haarcascade_frontalface_default.xml') haarEyes = cv.Load('haarcascade_eye.xml') # running the classifiers storage = cv.CreateMemStorage() detectedFace = cv.HaarDetectObjects(imcolor, haarFace, storage) detectedEyes = cv.HaarDetectObjects(imcolor, haarEyes, storage) # draw a green rectangle where the face is detected if detectedFace: for face in detectedFace: cv.Rectangle(imcolor,(face[0][0],face[0][1]), (face[0][0]+face[0][2],face[0][1]+face[0][3]), cv.RGB(155, 255, 25),2) # draw a purple rectangle where the eye is detected if detectedEyes: for face in detectedEyes: cv.Rectangle(imcolor,(face[0][0],face[0][1]), (face[0][0]+face[0][2],face[0][1]+face[0][3]), cv.RGB(155, 55, 200),2) cv.NamedWindow('Face Detection', cv.CV_WINDOW_AUTOSIZE) cv.ShowImage('Face Detection', imcolor) cv.WaitKey() These images are produced running the script with two different inputs. The first one is obtained from an image that contains two faces and four eyes: And the second one is obtained from an image that contains one face and two eyes (the shakira.jpg we used in the post about PCA):
January 20, 2012
ยท 18,711 Views
How To: Plot a Function of Two Variables with matplotlib
In this post we will see how to visualize a function of two variables in two ways. First, we will create an intensity image of the function and, second, we will use the 3D plotting capabilities of matplotlib to create a shaded surface plot. So, let's go with the code: from numpy import exp,arange from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis,title,show # the function that I'm going to plot def z_func(x,y): return (1-(x**2+y**3))*exp(-(x**2+y**2)/2) x = arange(-3.0,3.0,0.1) y = arange(-3.0,3.0,0.1) X,Y = meshgrid(x, y) # grid of point Z = z_func(X, Y) # evaluation of the function on the grid im = imshow(Z,cmap=cm.RdBu) # drawing the function # adding the Contour lines with labels cset = contour(Z,arange(-1,1.5,0.2),linewidths=2,cmap=cm.Set2) clabel(cset,inline=True,fmt='%1.1f',fontsize=10) colorbar(im) # adding the colobar on the right # latex fashion title title('$z=(1-x^2+y^3) e^{-(x^2+y^2)/2}$') show() The script would have the following output: And now we are going to use the values stored in X,Y and Z to make a 3D plot using the mplot3d toolkit. Here's the snippet: from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm from matplotlib.ticker import LinearLocator, FormatStrFormatter import matplotlib.pyplot as plt from numpy import sin,sqrt fig = plt.figure() ax = fig.gca(projection='3d') surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.RdBu,linewidth=0, antialiased=False) ax.zaxis.set_major_locator(LinearLocator(10)) ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) fig.colorbar(surf, shrink=0.5, aspect=5) plt.show() And this is the result: Source: http://glowingpython.blogspot.com/2012/01/how-to-plot-two-variable-functions-with.html
January 16, 2012
ยท 31,858 Views
Sound Synthesis with Numpy
Physically, sound is an oscillation of a mechanical medium that makes the surrounding air also oscillate and transport the sound as a compression wave. Mathematically, the oscillations can be described as where t is the time, and f the frequency of the oscillation. Each musical note vibrates at a particular frequency and to generate a tone we have to generate an oscillation with the appropriate frequency. The following table shows the complete musical scale between middle A and A-880, in the first column we have the tone and in the second the frequency that we have to use to generate the tone: Tone Freq A 440 B flat 466 B 494 C 523 C sharp 554 D 587 D sharp 622 E 659 F 698 F sharp 740 G 784 A flat 831 A 880 Sound on a computer is a sequence of numbers and we are going to see how to generate an array that represents a musical tone with numpy. The following function is able to generate a note using the formula above: from numpy import linspace,sin,pi,int16 # tone synthesis def note(freq, len, amp=1, rate=44100): t = linspace(0,len,len*rate) data = sin(2*pi*freq*t)*amp return data.astype(int16) # two byte integers And we can use this function to generate an A tone of 2 seconds with 44100 samples per second in this way: from scipy.io.wavfile import write from pylab import plot,show,axis # A tone, 2 seconds, 44100 samples per second tone = note(440,2,amp=10000) write('440hzAtone.wav',44100,tone) # writing the sound to a file plot(linspace(0,2,2*44100),tone) axis([0,0.4,15000,-15000]) show() The script put the sound into a wav file and we can play it with an external player. This plot shows a part of the signal generated by the script: Blog Source: http://glowingpython.blogspot.com/2011/09/sound-synthesis.html Article Type: How-to
November 14, 2011
ยท 15,407 Views
The sampling theorem explained with numpy
The sampling theorem states that a continuous signal x(t) bandlimited to B Hz can be recovered from its samples x[n] = x(n*T), where n is an integer, if T is greater than or equal to 1/(2B) without loss of any information. And we call 2B the Nyquist rate. Sampling at a rate below the Nyquist rate is called undersampling, it leads to the aliasing effect. Let's observe the aliasing effect with the following Python script: from numpy import linspace,cos,pi,ceil,floor,arange from pylab import plot,show,axis # sampling a signal badlimited to 40 Hz # with a sampling rate of 800 Hz f = 40; # Hz tmin = -0.3; tmax = 0.3; t = linspace(tmin, tmax, 400); x = cos(2*pi*t) + cos(2*pi*f*t); # signal sampling plot(t, x) # sampling the signal with a sampling rate of 80 Hz # in this case, we are using the Nyquist rate. T = 1/80.0; nmin = ceil(tmin / T); nmax = floor(tmax / T); n = arange(nmin,nmax); x1 = cos(2*pi*n*T) + cos(2*pi*f*n*T); plot(n*T, x1, 'bo') # sampling the signal with a sampling rate of 35 Hz # note that 35 Hz is under the Nyquist rate. T = 1/35.0; nmin = ceil(tmin / T); nmax = floor(tmax / T); n = arange(nmin,nmax); x2 = cos(2*pi*n*T) + cos(2*pi*f*n*T); plot(n*T, x2, '-r.',markersize=8) axis([-0.3, 0.3, -1.5, 2.3]) show() The following figure is the result: The blue curve is the original signal, the blue dots are the samples obtained with the Nyquist rate and the red dots are the samples obtainde with 35 Hz. It's easy to see that the blue samples are enough to recover the blue curve, while the red ones are not enough to capture the oscillations of the signal.
November 2, 2011
ยท 7,449 Views

Refcards

Refcard #183

Practical Data Mining with Python

๐Ÿ‘ Practical Data Mining with Python

User has been successfully modified

Failed to modify user

Let's be friends: