python Programming Glossary: dataset

IOError when trying to open existing files

http://stackoverflow.com/questions/10802418/ioerror-when-trying-to-open-existing-files

over 500 files 1 file gives one list so that I can build a dataset. THE CODE # usr local bin import os string from sys import version..

when to commit data in ZODB

http://stackoverflow.com/questions/11254384/when-to-commit-data-in-zodb

and thus can take a while depending on the size of your dataset. Using transaction to manage memory You are trying to build.. to manage memory You are trying to build a very large dataset by using persistence to work around constraints with memory.. commit signals you have completed constructing your dataset something you can use as one atomic whole. What you need to..

Large, persistent DataFrame in pandas

http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas

numeric data. With SAS I can import a csv file into a SAS dataset and it can be as large as my hard drive. Is there something..

Python: tf-idf-cosine: to find document similarity

http://stackoverflow.com/questions/12118720/python-tf-idf-cosine-to-find-document-similarity

import TfidfVectorizer from sklearn.datasets import fetch_20newsgroups twenty fetch_20newsgroups tfidf TfidfVectorizer.. the cosine distances of one document e.g. the first in the dataset and all of the others you just need to compute the dot products..

Converting between datetime, Timestamp and datetime64

http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64

dt64 . Update a somewhat nasty example in my dataset perhaps the motivating example seems to be dt64 numpy.datetime64..

“Large data” work flows using pandas

http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas

but I currently lack an out of core workflow for large datasets. I'm not talking about big data that requires a distributed.. drive. My first thought is to use HDFStore to hold large datasets on disk and pull only the pieces I need into dataframes for.. information like criminal records bankruptcies etc... The datasets I use every day have nearly 1 000 to 2 000 fields on average..

pandas: How do I split text in a column into multiple columns?

http://stackoverflow.com/questions/17116814/pandas-how-do-i-split-text-in-a-column-into-multiple-columns

in 'text to columns' function and a quick macro but my dataset has too many records for excel to handle. Ultimately I want..

more efficient way to calculate distance in numpy?

http://stackoverflow.com/questions/17527340/more-efficient-way-to-calculate-distance-in-numpy

'euclidean' EDIT I cannot run tests on such a large dataset but these timings are rather enlightening len_a len_b 10000.. d 1 loops best of 3 221 ms per loop For the above smaller dataset I can get a slight speed up over your method with scipy.spatial.distance.cdist..

Peak detection in a 2D array

http://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array

a local maximum filter . Here is the result on your first dataset of 4 paws I also ran it on the second dataset of 9 paws and.. your first dataset of 4 paws I also ran it on the second dataset of 9 paws and it worked as well . Here is how you do it import..

Calculating Pearson correlation and significance in Python

http://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python

coefficient measures the linear relationship between two datasets. Strictly speaking Pearson's correlation requires that each.. Strictly speaking Pearson's correlation requires that each dataset be normally distributed. Like other correlation coefficients.. the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the..

How do you remove duplicates from a list in Python whilst preserving order?

http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order

EDIT If you plan on using this function a lot on the same dataset perhaps you would be better off with an ordered set http code.activestate.com..

How to remove stop words using nltk or python

http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python

to remove stop words using nltk or python So I have a dataset that I would like to remove stop words from using stopwords.words.. take out these words. I have a list of the words from this dataset already the part i'm struggling with is comparing to this list..

Multivariate spline interpolation in python/scipy?

http://stackoverflow.com/questions/6238250/multivariate-spline-interpolation-in-python-scipy

False . Even if you have enough ram keeping the filtered dataset around can be a big speedup if you need to call map_coordinates..

memory-efficient built-in SqlAlchemy iterator/generator?

http://stackoverflow.com/questions/7389759/memory-efficient-built-in-sqlalchemy-iterator-generator

that intelligently fetched bite sized chunks of the dataset for thing in session.query Things analyze thing To avoid this..

plotting a smooth curve in matplotlib graphs

http://stackoverflow.com/questions/14705062/plotting-a-smooth-curve-in-matplotlib-graphs

now the plot looks and the code is... from netCDF4 import Dataset from pylab import import numpy from scipy import interpolate.. import spline #passing the filename root_grp Dataset 'C Python27 MyPrograms nnt206rwpuvw.nc' #getting values of u..

How to sort my paws?

http://stackoverflow.com/questions/4502656/how-to-sort-my-paws

when it does work with reasonable confidence. Training Dataset From the pattern based classifications where it worked correctly..

Calculating the percentage of variance measure for k-means?

http://stackoverflow.com/questions/6645895/calculating-the-percentage-of-variance-measure-for-k-means

example of KMeans clustering applied on the 'Fisher Iris Dataset' 4 features 150 instances . We iterate over k 1..10 plot the.. 'Petal Length' plt.ylabel 'Sepal Width' plt.title 'Iris Dataset KMeans clustering with K d' K kIdx plt.legend plt.show EDIT#2..

Using frequent itemset mining to build association rules?

http://stackoverflow.com/questions/7047555/using-frequent-itemset-mining-to-build-association-rules

if I go wrong somewhere. I have two datasets like this Dataset 1 A B C 0 E A 0 C 0 0 A 0 C D E A 0 C 0 E The way I interpret.. time A B C E occurred together and so did A C A C D E etc. Dataset 2 5A 1B 5C 0 2E 4A 0 5C 0 0 2A 0 1C 4D 4E 3A 0 4C 0 3E The way..

Is there a good way to do this type of mining?

http://stackoverflow.com/questions/7076349/is-there-a-good-way-to-do-this-type-of-mining

there a good and scalable approach to achieve this Sample Dataset 1 23 1 23 2 23 3 23 4 23 5 23 6 23 7 23 8 23 9 23 10 23 11 23..