30 Jul 2008

旧将撼主:新的cuil搜索引擎对抗google

在网络世界中,谷歌无疑是搜索引擎网站的巨头。但几位来头不小的谷歌“叛将”却在7月28日推出新搜索引擎Cuil(发音与“cool”相同)来挑战旧 主。据Cuil声称,他们可以搜到1200亿个页面,大约是谷歌的3倍,绝对是网络搜索的利器。不过,有分析则称,虽然Cuil风头很盛,但要马上威胁到 谷歌的地位仍不现实。(网站地址:www.cuil.com

##CONTINUE##

Cuil由Googlebase前首席技术官安娜·帕特森与她的丈夫科斯泰罗联手创办,他们的两位亲密助手拉塞尔·帕威尔和路易斯·蒙尼尔同样也曾是谷歌 的资深员工。其中,帕特森来头最大,她在此前曾一手建立起功能强劲的搜索引擎Recall,并在2004年被谷歌连人带网站高价收购。2006 年,更喜欢独自高飞的帕特森辞职,并开始创建Cuil。而协助IBM构建新型搜索引擎WebFountain的丈夫科斯泰罗也成为她最有力的帮手。

29 Jul 2008

pylab: draw circle

借助matlab中最强有力的函数--help,终于在pylab里画了一个圆,matlab也真是,没有自带的circle函数,还得自己写.真是麻烦.先来matlab里的画圆函数:
THETA=linspace(0,2*pi,NOP);
RHO=ones(1,NOP)*radius;
[X,Y] = pol2cart(THETA,RHO);
X=X+center(1);
Y=Y+center(2);
H=plot(X,Y,style);
axis square;
##CONTINUE##
照猫画虎,来段python的,可惜pylab没有做实pol2cart这个函数,没办法,照着思路自己来吧:
linspace产生0-2*pi的连续角度,然后计算x,y,最后加上圆心坐标,再给plot就行了.

def myCircle(center,radius,NOP=100):
theta = linspace(0,2*pi,NOP)
x=radius * cos(theta) + center[0]
y=radius * sin(theta) + center[1]
return plot(x,y)

28 Jul 2008

python GUI interface introduction

These days, I was working on the disertation which involved with some algorithms implementations. During the period, I need to analysis the results by visual pictures. However, as a newbie with python, I have two choices: Tkinter and wxPython. I will try to write some posts about both of these choices.

For wxPython, I will focus on the combination of matplotlib and wxPython, for my work is mainly involved with matplotlib.
##CONTINUE##

Matplotlib: matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala matlab or mathematica), web application servers, and six graphical user interface toolkits.

The latest release of Matplotlib is 0.98.1, it requires python2.4 or 2.5 and numpy 1.1. You can download them from internet if you need.

19 Jul 2008

ZT: python黑板报

1. 关于python的启动
除了直接启动进入解释器之外(不要忘记文件结束符是^Z(win)和^D(*nix/Mac)),另外两种方法也很常用:
python -c "a=int(raw_input('a=')); print a*3"
这是直接执行python语句而不必事先写程序。注意整体用双引号,python语言内部用单引号以免引起混淆。
python -i test.py
执行完后回到解释器。注意-i必须在test.py之前。这个方法也适合-c的情形,如:
python -i -c "a=3"
需要注意的是不要用重定向 python <>

2. 脚本的命令行参数
存在sys.argv里。这是一个list.

3. 注释
以#开头,类似于C的//,为单行注释。

4. 输出
用print. 注意复杂类型也可以直接打印,例如list. 比如
import sys
print sys.argv
最后加个逗号可以避免换行。注意每个逗号后实际被追加了一个空格。
print 'a',
print 'b'
由于python有自省功能,也可以打印变量的类型。
print type(sys.argv)
利用解释器学习python的时候可以多用type(_)来确认上一次运行结果的类型。

5. 赋值
和一般的语言一样,且可以是多重的,如
x = y = z = 0

6. 数值类型
整数、浮点数和复数都是支持的。

* 整数没有大小限制:
print 111111111*111111111
print int('12345678987654321')

* 浮点数使用硬件IEEE浮点类型。
print 1.0/3
print float('0.123') # constructor
print 0.1+0.1+0.1-0.3 # not zero!

* 高精度十进制浮点数在decimal模块的Decimal类:
import decimal;
from decimal import Decimal
print Decimal(1)/Decimal(7)
decimal.getcontext().prec = 100 # change precision. default is 28
print Decimal(1)/Decimal(7)
print Decimal('0.1')+Decimal('0.1')+Decimal('0.1')-Decimal('0.3') # exactly zero

* 复数的实部和虚部都是float. 虚数单位用j或者J表示。
a=3+4j
print a.real
print a.imag
print abs(a)

7. 字符串
字符串常量可以用单引号、双引号括起来,也可以用raw字符串r""或者多行串(用"""或者'''包围)。
字符串是不可变的。
print 'a' + 'b' # ab
print 'a' * 3 # aaa
print 'abcd'[2] # c
print len('asdf') # 4
其中第3种用法在C里是无效的。
slicing总是从左到右的,且都是左闭右开区间,因此[-2:]是最后两个字符。

8. 字符串方法分类
判断类
* isalnum()/isalpha()/isdigit()/islower()/isspace()/istitle()/isupper()

查找类
* endswith/startswith(s[,star[,end]]) # 返回bool型
* find/rfind/count(sub[,start[,end]])

变换
* expandtabs([tabsize]) # tabsize默认为8
* replace(old,new[,count]) # 替换所有。如果有count, 则只替换前count个
* ljust/rjust(width[,fillchar])
* lower()/upper()/capitalize()/swapcase()/title()
* lstrip([chars])/rstrip([chars])/strip([chars])

Token处理类
* partition/rpartition(sep) # 从第一次/最后一次出现处分开 e.g 'abc!=12'.partition('!=') returns ('abc','!=','12')
* split/rsplit(sep[,maxsplit]) # 'a--b---c'进行左/右split的结果不一样. 默认是strip后用连续空格隔开
* join(seq) # e.g '-'.join(['a','b','c']) returns 'a-b-c'

8. 字符串格式化
格式化运算符为%,适用于str和unicode,格式为format % values。当要求单参数时values必须是非元组对象,否则
values必须是一个元组(个数和format的需要保持一致),或一个映射(如字典)。例:
print '%(language)s has %(#)03d quote types.' % \
{'language': "Python", "#": 2}


9. 序列操作
参考:Library Reference的3.6节.
python有六种序列类型:str, unicode, list, tuple, buffer, xrange.
大多数序列都有以下操作:x in s, x not in s, s+t, s*n, n*s, s[i], s[i:j], s[i:j:k], len(s), min(s), max(s). i,j,n是整数。
注意乘法(重复操作)是浅拷贝,因此lists=[[]]*3;lists[0].append(3)后得到的是[[3],[3],[3]].
str的in/not in操作可以用来作模式匹配,而不仅仅用来查找单个字符。

10. 列表
列表(lists)是最常用的序列类型,它的各个元素不必有相同的类型。
列表是可变的,甚至可以对它的slice赋值。一些经典的例子。
>>> # Replace some items:
... a[0:2] = [1, 12]
>>> a
[1, 12, 123, 1234]
>>> # Remove some:
... a[0:2] = []
>>> a
[123, 1234]
>>> # Insert some:
... a[1:1] = ['bletch', 'xyzzy']
>>> a
[123, 'bletch', 'xyzzy', 1234]
>>> # Insert (a copy of) itself at the beginning
>>> a[:0] = a
>>> a
[123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
>>> # Clear the list: replace all items with an empty list
>>> a[:] = []
>>> a
[]

11. 基本控制流
注意python是靠的缩进而不是begin/end或者{}来指定的语句块。语句块至少要有一条语句,如果实在不需要的话用pass
* while语句
while b > 0:
b--

* if-elif-else语句
注意,没有C中的switch。
if x < y =" -x" x ="="" y =" 0" y =" x

* for语句
是在一个序列中迭代。
a = ['cat', 'dog', 'bird']
for x in a:
print x, len(x)
注意在迭代的时候修改list很不安全,这时可以迭代它的一个备份,如:

>>> for x in a[:]: # make a slice copy of the entire list
... if len(x) > 6: a.insert(0, x)
...
>>> a
['defenestrate', 'cat', 'window', 'defenestrate']

如果要迭代下标,经常用range函数:
for i in range(len(a)):
print i, a[i]

* break和continue
break和continue的用法一样,但是从break退出和自然推出(for的元素迭代完毕; while的条件为假)是
不一样的。自然退出将执行else语句,这里是一个很好的例子:
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
print n, 'equals', x, '*', n/x
break
else:
# loop fell through without finding a factor
print n, 'is a prime number'

18 Jul 2008

conditional operator/ternary operator in python

25.04.2009更新:
a and b or c,必须保证b是true,否则有误.

很多语言种都有这样一个三目运算符,bool ? a : b,使用也很方便,bool为true返回a,否则返回b,python中需要使用and or这两个东东达到同样的效果,实际上python中的逻辑运算符挺让人头疼的。

举个简单的例子:

python 代码
>>> a = "first"
>>> b = "second"

>>> 1
and a or b
'first'
>>> 0 and a or b
'second'

上面的这段代码就是等同三目运算符的了.

and操作会依次比较各个表达式,返回第一个为false的表达式,否则返回最后一个为true的表达式
or操作刚好相反,返回第一个为true的表达式,否则返回最后一个为false的表达式

这样在看上面的代码就容易多了

Example code的脚本中有这么一段

python 代码
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)


很好的利用了and和or的用法,可以根据collapse的值来给前面的函数附值,多么方便


update:

如果你来自 C/C++ 或者是 Java 世界, 那么你很难忽略的一个事实就是 Python 在很长的一段时间里没有条件表达式(C ? X : Y), 或称三元运算符. ( C 是条件表达式; X 是 C 为 True 时的结果, Y 是 C 为 False 时的结果) 贵铎·范·罗萨姆一直拒绝加入这样的功能, 因为他认为应该保持代码简单, 让程序员不轻易出错. 不过在十年多后, 他放弃了, 主要是因为人们试着用and 和 or 来模拟它, 但大多都是错误的. 根据 FAQ , 正确的方法(并不唯一)是(C and [X] or [Y])[0] . 唯一的问题是社区不同意这样的语法. (你可以看一看 PEP 308, 其中有不同的方案.) 对于Python 的这一问题,人们表达了极大的诉求.
贵铎·范·罗萨姆最终选择了一个最被看好(也是他最喜欢)的方案, 然后把它运用于标准库中的一些模块. 根据 PEP , "这个评审通过考察大量现实世界的案例, 包含不同的应用, 以及由不同程序员完成的代码." 最后 Python 2.5 集成的语法确定为: X if C else Y .

17 Jul 2008

Python file operations

file is a special class in Python. Everything is object in python, so is file. It has attributes and methods. Now, let's study something about file:

file(name[, mode[, buffering]])


file() can create a file object, it's a build-in method. You may find someone else will use open(), actually it's an old version method. All parameters are passed in as string.

name: file name;
mode: there're 4 modes: r, w, a and U.
r: read mode;
w: write mode;
a: append mode;
U is a special mode. In Unix/Linux, line break is represented as '\n', while '\r\n' in Windows. U mode is used for support both type of line break.But python prefers '\n' to save these different line break symbols by a tuple.


file's attributes:

closed # if the file has been closed.
encoding # encoding type
mode # open mode
name # file name
newlines # line break mode, it's a tuple
softspace # boolean. indicates whether a space character needs to be printed before another value when using the print statement


file's mehods:

read([size]) # size is the byte length you want to read, read all file if it's not specified.
readline([size]) # read ONE line, maybe less than one line if size is smaller than the length of the line.
readlines([size]) # read several line into a list. size is the whole length.
write(str) # write str into file, there's no line break if you didn't specify.
writelines(seq) # write seq into file
close() # close the file. Although python will close a file while program don't need it anymore, but there's no guarantee. It's better to call this methods by yourselves.
flush() # write buffered content into the file.
fileno() # return a long file tag.
isatty() # is a terminal file(Unix/Linux).
tell() # current position from beginning of the file.
next() # return next line and move the pointer to next line.
seek(offset[,whence]) # move the pointer to offset. Usually calculated from the beginning except whence is specified. 0 beginning, 1 current position, 2 the end. Note: if the open mode is a or a+, the pointer will move to the end after every write operation.
truncate([size]) # cut the file into several segments with specified size, default cut untill reach the current position. If size is bigger than the file size, according to different OS, maybe there's no change, maybe add 0, maybe add random content.

8 Jul 2008

Smart Kun Fu Panda

往往在逃避命运的路上,却与之不期而遇

One meets its destiny on the road he takes to avoid it

你的思想就如同水,我的朋友,当水波摇曳时,很难看清,不过当它平静下来,答案就清澈见底了。

Your mind is like this water, my friend , when it is agitated ,it becomes difficult to see ,but if you allow it to settle , the answer becomes clear.

昨天是历史,明天是谜团,只有今天是天赐的礼物。

Yesterday is history, Tomorrow is a mystery. But today is a gift, That is why it’s called the present (the gift)

从来没有什么意外

There are no accidents.

师傅:但有些事情我们可以控制,我可以控制果实何时坠落,我还可以控制在何处播种。

But there are things we can control

I can control when the fruit will fall

... And I can control

What time to seed

乌龟:是啊 不过无论你做了什么,那个种子还是会长成桃树,你可能想要苹果或桔子,可你只能得到桃子,那个种子还是会长成桃树。

Yes, but no matter what you do,

That seed will grow to be a peach tree

You may wish for an Apple or an orange

But you will get a peach

师傅:可桃子不能打败豺狼

But peache can not defeate Tai Long

乌龟:也许它可以的 ,如果你愿意引导它、滋养它、相信它。

Maybe it can if you are willing to guide it , to nuture it , to believe in it.

我私家汤的绝密食材,就是……什么都没有。

The secret ingredient of my secret ingredient soup is...nothing.

认为它特别,它就特别了。

To make something special ,you just have to believe it’s special.

benchmark Python algorithms

there are several useful methods in the python library module 'time'
we can use them to benchmark our code segements or the algorithms.
the typical uses like:

>>> import time
>>> t=time.clock()
>>> t
4.4698418374402335e-006

>>> t=time.clock()
>>> time.clock()-t
6.635907915670856

>>> t=time.time()
>>> time.time()-t
5.0940001010894775
>>>

As we know from the code segemets posted above, we can use both clock() and time() methods to measure the time slice. However, there are still some differences between them, following is copied from the official documents:
clock( )
On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of ``processor time'', depends on that of the C function of the same name, but in any case, this is the function to use for benchmarking Python or timing algorithms.

On Windows, this function returns wall-clock seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter(). The resolution is typically better than one microsecond.


and for the time():
time( )
Return the time as a floating point number expressed in seconds since the epoch, in UTC. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.

If we just read the definitions of these two method, we would think that the clock()'s precision is better than time()'s. But if you run the code offered above, you will found that time() has more usefull number than clock(). I don't know why at these moment.

For more details, please read:
http://docs.python.org/lib/module-time.html

7 Jul 2008

google金山合作推出翻译软件

昨天才看到这个发布了几个月重量级的新闻
这对中意免费软件的我来说,是莫大的一个利好消息
金山在翻译领域的资历加上google强大无比的搜索能力
必将开拓在线本地结合搜索领域的一个新局面
下载试用后,感觉也不错,
这个版本的屏幕取词功能尤其强
我原来的2007龙卷风版并不支持firefox
但是这个版竟然完美表现
直接让我的ff扩展qtl下课走人.

下面是链接,去试试吧.
http://www.google.cn/rebang/product/dictionary/dictionary.html

5 Jul 2008

random module in Python

>>> import random

>>> # 0.0 <= float < 1
>>> random.random()
0.41360177662769904

>>> # 10.0 <= float < 20
>>> random.uniform(10,20)
15.743669918803288

>>> # 10 <= int <= 20 (can be 20)
>>> random.randint(10,20)
10

>>> # 10 <= int < 20 (only even number, step=2)
>>> random.randrange(10,20,2)
16

>>> # choose from a list
>>> random.choice([1, 2, 3, 5, 9])
2

>>> # make a list into random order
>>> cards = range(52)
>>> random.shuffle(cards) # order is random now
>>> cards[:5] # get 5 cards
[37, 14, 42, 44, 6]

2 Jul 2008

通过邮件地址找到space地址

Belem便利用其做了一个通过Email地址找出Space地址的小工具:MailtoSpace

mailtospace

输入你想查询的MSN帐号地址,点击“获取共享空间链接”稍等片刻系统便会算出该帐号对应的Space地址。

需要注意,如果这个帐号没有创建空间,及创建的Live Space中没有档案文件(个人信息)模块,那么就无法算出其对应的空间地址了。


最后补一句,你可以直接通过这样URL访问其Live Spaces:http://spaces.live.com/profile.aspx?mem=gerry@live.com,管他是否在其Live Space里添加没添加档案文件模块。。。(别丢鸡蛋、青菜的,直接扔个鸡蛋煎饼过来,省事…)

1 Jul 2008

PyScripter: win下完美的python IDE

今天无聊到处找python的IDE, 由于习惯了endnote, 只好放弃论文阶段ubuntu活命的策略,
继续转战回win, 安装好python2.5之后就开始寻找合适的IDE,虽然自带的IDLE还好,
但是keyword complete确实差强人意,而且PSpad只是highlights keyword,并不能自动完成.
找来找去,找到了PyScripter,哈哈,合适我用.
唯一可惜的界面字体太小了.但是其它功能无人可敌,
尤其值得一提的是document view source code的功能,可以自动生成HTML文档,嘿嘿
只要coding的时候有格式,生成的文档还真是优美哎
赞一个!