玩蛇网提供最新Python编程技术信息以及Python资源下载!
您现在的位置: 玩蛇网首页 > Python源码实例_Python程序源代码_网站项目下载 > 正文内容

用Python3处理文件中每个词的方法

Python入门佳作 经典教程的全新修订 10个项目引人入胜
玩蛇网推荐图文教程:python黑客多线程扫描器
''''  
Created on Dec 21, 2012  
处理文件中的每个词  
@author: liury_lab  
''' 
import codecs  
the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8')  
for line in the_file:  
  for word in line.split():  
    print(word, end = "|")  
the_file.close()  
# 若词的定义有变,可使用正则表达式  
# 如词被定义为数字字母,连字符或单引号构成的序列  
import re  
the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8')  
print()  
print('************************************************************************')  
re_word = re.compile('[\w\'-]+')  
for line in the_file:  
  for word in re_word.finditer(line):  
    print(word.group(0), end = "|")  
the_file.close()  
# 封装成迭代器  
def words_of_file(file_path, line_to_words = str.split):  
  the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8')  
  for line in the_file:  
    for word in line_to_words(line):  
      yield word  
  the_file.close()  
print()  
print('************************************************************************')  
for word in words_of_file('d:/text.txt'):  
  print(word, end = '|')  
def words_by_re(file_path, repattern = '[\w\'-]+'):  
  the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8')  
  re_word = re.compile('[\w\'-]+')  
   
  def line_to_words(line):  
    for mo in re_word.finditer(line):  
      yield mo.group(0) # 原书为return,发现结果不对,改为yield  
  return words_of_file(file_path, line_to_words)  
print()  
print('************************************************************************')  
for word in words_by_re('d:/text.txt'):  
  print(word, end = '|')
本文原创自www.iplaypython.com玩蛇网会员:繁星123

玩蛇网文章,转载请注明出处和来源网址:http://www.iplaypython.com/code/c243.html



微信公众号搜索"玩蛇网Python之家"加关注,每日最新的Python资讯、图文视频教程可以让你一手全掌握。强烈推荐关注!

微信扫描下图可直接关注

玩蛇网PythonQQ群,欢迎加入: ① 240764603 玩蛇网Python新手群
出炉日期:2015-12-15 11:34 玩蛇网 www.iplaypython.com

我要分享到:
评论列表(网友评论仅供网友表达个人看法,并不表明本站同意其观点或证实其描述)

必知PYTHON教程 Must Know PYTHON Tutorials

必知PYTHON模块 Must Know PYTHON Modules