Project2-Cat
实现一个金山打字通(?)
具体实现:记录打字速度以及自动修正拼写错误的字符
Phase 1: Typing
实现打字以及检测打字速度相关功能
Problem 1
挑选用户打字的段落
pick函数
参数:
- paragraphs:一串字符串记录了打字内容
- select:一个函数,检测段落是否能被选择
- k:非负数,作为index
思路:
- 函数功能实现了用户选择第k个段落作为打字内容
- 若选择的k没有对应段落则返回空字符串
- 选择的字符串要符合select函数的条件
- 符合条件的才能编号第k个字符串
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def pick(paragraphs, select, k):
"""Return the Kth paragraph from PARAGRAPHS for which SELECT called on the
paragraph returns True. If there are fewer than K such paragraphs, return
the empty string.
Arguments:
paragraphs: a list of strings
select: a function that returns True for paragraphs that can be selected
k: an integer
>>> ps = ['hi', 'how are you', 'fine']
>>> s = lambda p: len(p) <= 4
>>> pick(ps, s, 0)
'hi'
>>> pick(ps, s, 1)
'fine'
>>> pick(ps, s, 2)
''
"""
# BEGIN PROBLEM 1
valid_para = []
for s in paragraphs:
if select(s):
valid_para.append(s)
if k>=len(valid_para):
return ''
else:
return valid_para[k]
# END PROBLEM 1Problem 2
通过给定的关键词选取段落
about函数
参数:subject:关键词列表
思路:
- about函数用于pick函数中的select选项,用于筛选指定关键字的段落
- 故它返回的是一个select函数,若段落满足条件则返回True否则返回False
- 单词匹配时不区分大小写,但要是一个完整单词,且不能是单词的字串
- 可以使用给定的函数remove_punctuation-去除标点,lower-变小写,split-将一句话分割为若干单词存入列表
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def about(subject):
"""Return a select function that returns whether
a paragraph contains one of the words in SUBJECT.
Arguments:
subject: a list of words related to a subject
>>> about_dogs = about(['dog', 'dogs', 'pup', 'puppy'])
>>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup!'], about_dogs, 0)
'Cute Dog!'
>>> pick(['Cute Dog!', 'That is a cat.', 'Nice pup.'], about_dogs, 1)
'Nice pup.'
"""
assert all([lower(x) == x for x in subject]), 'subjects should be lowercase.'
# BEGIN PROBLEM 2
def select(para):
sp_para = split(remove_punctuation(para))
for i in range(0,len(sp_para)):
sp_para[i]=lower(sp_para[i])
for s in subject:
if s in sp_para:
return True
return False
return select
# END PROBLEM 2Problem 3
计算已经输入且匹配单词占需要输入内容的百分比
accuracy函数
参数:
- typed:已经输入的内容
- source:需要输入的内容
思路:
- 按单词顺序匹配指定输入内容,第一个对第一个,第二个对第二个….
- 区分大小写,且包含标点符号
- 若已经输入内容比需要输入内容长,则长的部分认定为不正确
- 若两者均为空字符串则准确率为100.0,若前者为空后者非空或前者非空后者为空则准确率为0.0
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def accuracy(typed, source):
"""Return the accuracy (percentage of words typed correctly) of TYPED
when compared to the prefix of SOURCE that was typed.
Arguments:
typed: a string that may contain typos
source: a string without errors
>>> accuracy('Cute Dog!', 'Cute Dog.')
50.0
>>> accuracy('A Cute Dog!', 'Cute Dog.')
0.0
>>> accuracy('cute Dog.', 'Cute Dog.')
50.0
>>> accuracy('Cute Dog. I say!', 'Cute Dog.')
50.0
>>> accuracy('Cute', 'Cute Dog.')
100.0
>>> accuracy('', 'Cute Dog.')
0.0
>>> accuracy('', '')
100.0
"""
typed_words = split(typed)
source_words = split(source)
# BEGIN PROBLEM 3
cnt = 0
for t, s in zip(typed_words, source_words):
if t==s:
cnt+=1
if len(typed_words)==0 and len(source_words)==0:
return 100.0
elif len(typed_words)==0 or len(source_words)==0:
return 0.0
else:
return (cnt/len(typed_words))*100
# END PROBLEM 3注意:
- 同时遍历两个列表时,使用zip函数解包,因为原代码实现的是遍历一个包含两个列表的元组
- 空字符部分特判,因为计算百分比时可能会导致分母为0
- 计算的是已经输入字符与需要输入字符匹配的字符个数在已经输入字符中的占比,分母是typed_words的长度
Problem 4
按照words/min计算打字速度
wpm函数
参数:
- typed:已经输入的内容
- elapsed:总共打字时间(按秒计)
思路:
- wpm的计算是按照字符数来的,单位是每五个字符数,这样可以减少单词长度对结果的影响
- \(wpm = \frac{\frac{字符数}{5}}{时间(min)}\)
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def wpm(typed, elapsed):
"""Return the words-per-minute (WPM) of the TYPED string.
Arguments:
typed: an entered string
elapsed: an amount of time in seconds
>>> wpm('hello friend hello buddy hello', 15)
24.0
>>> wpm('0123456789',60)
2.0
"""
assert elapsed > 0, 'Elapsed time must be positive'
# BEGIN PROBLEM 4
length = len(typed)
return (length/5)/(elapsed/60)
# END PROBLEM 4注意:elapsed单位是秒,wpm使用的时间是分钟
Phase 2: Autocorrect
按下空格触发单词自动纠正,若最近的一个词接近正确词汇但不正确,则会用正确词汇替代
Problem 5
返回一个列表,内部提供了几个接近输入单词的正确单词
参数:
- typed_word:一个字符串,表示输入的单词
- word_list:源单词列表
- diff_function:评估单词不同程度的函数
- limit:单词能否被更改的阈值
思路:
autocorrect实现的
- 若输入字符已经在word_list中,将会直接返回该字符
- 否则会返回基于diff函数计算的不同程度最小的单词
- 若输入单词与word_list中最小的不同程度仍大于limit,则返回输入单词
所有输入单词和word_list中的单词都是小写且没有标点
若多个单词与输入字符不同程度最小相同,则返回最靠前的
不同程度可以使用两个单词相同部分长度衡量
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def autocorrect(typed_word, word_list, diff_function, limit):
"""Returns the element of WORD_LIST that has the smallest difference
from TYPED_WORD. If multiple words are tied for the smallest difference,
return the one that appears closest to the front of WORD_LIST. If the
difference is greater than LIMIT, instead return TYPED_WORD.
Arguments:
typed_word: a string representing a word that may contain typos
word_list: a list of strings representing source words
diff_function: a function quantifying the difference between two words
limit: a number
>>> ten_diff = lambda w1, w2, limit: 10 # Always returns 10
>>> autocorrect("hwllo", ["butter", "hello", "potato"], ten_diff, 20)
'butter'
>>> first_diff = lambda w1, w2, limit: (1 if w1[0] != w2[0] else 0) # Checks for matching first char
>>> autocorrect("tosting", ["testing", "asking", "fasting"], first_diff, 10)
'testing'
"""
# BEGIN PROBLEM 5
if typed_word in word_list:
return typed_word
word_diff = [diff_function(typed_word, s, limit) for s in word_list]
min_diff = min(word_diff)
if min_diff > limit:
return typed_word
else :
return word_list[word_diff.index(min_diff)]
# END PROBLEM 5注意:
- 可以使用
list.index(<val>)获取列表中某值的下标
Problem 6
返回为纠正单词需要修改的字符个数
参数:
- typed:输入单词
- source:目标单词
- limit:最多修改字符数
思路:
比较个位字符,若不同则执行更改,记录更改数
若一边比另一边长,长度的不同也算入更改数
若更改数比limit打,则返回任何大于limit的数(为了避免多余的计算)
要求使用递归
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def feline_fixes(typed, source, limit):
"""A diff function for autocorrect that determines how many letters
in TYPED need to be substituted to create SOURCE, then adds the difference in
their lengths and returns the result.
Arguments:
typed: a starting word
source: a string representing a desired goal word
limit: a number representing an upper bound on the number of chars that must change
>>> big_limit = 10
>>> feline_fixes("nice", "rice", big_limit) # Substitute: n -> r
1
>>> feline_fixes("range", "rungs", big_limit) # Substitute: a -> u, e -> s
2
>>> feline_fixes("pill", "pillage", big_limit) # Don't substitute anything, length difference of 3.
3
>>> feline_fixes("roses", "arose", big_limit) # Substitute: r -> a, o -> r, s -> o, e -> s, s -> e
5
>>> feline_fixes("rose", "hello", big_limit) # Substitute: r->h, o->e, s->l, e->l, length difference of 1.
5
"""
# BEGIN PROBLEM 6
# assert False, 'Remove this line'
def compute(typed_w,source_w,count):
if count > limit:
return limit+1
if typed_w == "" and source_w == "":
return count
elif typed_w == "":
return compute(typed_w,source_w[1:],count+1)
elif source_w == "":
return compute(typed_w[1:],source_w,count+1)
elif typed_w[0]==source_w[0]:
return compute(typed_w[1:],source_w[1:],count)
else:
return compute(typed_w[1:],source_w[1:],count+1)
return compute(typed,source,0)
# END PROBLEM 6注意:
- 超过limit限制的一律设为limit+1
- 通过slice来每次递归往后截取一位字符,比较每次截取后的首字符来判断相同与否,不相同让count+1
- 递归出口为两字符串均被截取成空串,返回计数
- 长度不同的情况下,已经被截成空串的不再截取,长串继续截取,该情况下直接让count+1
Problem 7
返回将输入字符改为目标字符所需要执行的操作次数,操作有以下:
- 添加字符
- 删除字符
- 替换字符
参数:
- typed:输入单词
- source:目标单词
- limit:最多修改字符数
思路:
- 比较输入单词与指定单词,寻找不同位置
- 若需要修改数量大于limit,则返回任何大于limit的值
- 代码需要使用递归,应需要三个递归调用以及两个递归出口
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def minimum_mewtations(typed, source, limit):
"""A diff function that computes the edit distance from TYPED to SOURCE.
This function takes in a string TYPED, a string SOURCE, and a number LIMIT.
Arguments:
typed: a starting word
source: a string representing a desired goal word
limit: a number representing an upper bound on the number of edits
>>> big_limit = 10
>>> minimum_mewtations("cats", "scat", big_limit) # cats -> scats -> scat
2
>>> minimum_mewtations("purng", "purring", big_limit) # purng -> purrng -> purring
2
>>> minimum_mewtations("ckiteus", "kittens", big_limit) # ckiteus -> kiteus -> kitteus -> kittens
3
"""
# assert False, 'Remove this line'
if limit<0: # Base cases should go here, you may add more base cases as needed.
# BEGIN
return 0
# END
# Recursive cases should go below here
if typed == "" and source == "": # Feel free to remove or add additional cases
# BEGIN
return 0
# END
elif typed == "" or source == "":
return abs(len(typed)-len(source))
elif typed[0] == source[0]:
return minimum_mewtations(typed[1:],source[1:],limit)
else:
add = minimum_mewtations(typed,source[1:],limit-1)
remove = minimum_mewtations(typed[1:],source,limit-1)
substitute = minimum_mewtations(typed[1:],source[1:],limit-1)
# BEGIN
return min(add,remove,substitute)+1
# END注意:
和上一个problem类似,使用递归与slice一位位判断
三种操作依次递归,并取最小值
Phase 3: Multiplayer
实现多人对战模式
Problem 8
将玩家输入进度与信息传入多人服务器并返回对应信息
report_progress函数
参数:
- typed:输入的字符,列表存储
- source:需要输入的字符,列表存储
- user_id:当前玩家id
- upload:上传进度的函数
思路:
- 将输入字符与需要输入字符比较,计算输入进度
- 输入进度指已经输入正确单词与需要输入正确单词的比例(因此,若中间有一个单词打错,后面再正确也不会记录)
- 函数的返回值是计算的进度
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def report_progress(typed, source, user_id, upload):
"""Upload a report of your id and progress so far to the multiplayer server.
Returns the progress so far.
Arguments:
typed: a list of the words typed so far
source: a list of the words in the typing source
user_id: a number representing the id of the current user
upload: a function used to upload progress to the multiplayer server
>>> print_progress = lambda d: print('ID:', d['id'], 'Progress:', d['progress'])
>>> # The above function displays progress in the format ID: __, Progress: __
>>> print_progress({'id': 1, 'progress': 0.6})
ID: 1 Progress: 0.6
>>> typed = ['how', 'are', 'you']
>>> source = ['how', 'are', 'you', 'doing', 'today']
>>> report_progress(typed, source, 2, print_progress)
ID: 2 Progress: 0.6
0.6
>>> report_progress(['how', 'aree'], source, 3, print_progress)
ID: 3 Progress: 0.2
0.2
"""
# BEGIN PROBLEM 8
typed_count = 0
for t,s in zip(typed,source):
if t != s:
break
typed_count+=1
ratio = typed_count/len(source)
upload({'id': user_id, 'progress': ratio})
return ratio
# END PROBLEM 8注意:
- 判断是否相等时,若发现不等就需要break结束循环
Problem 9
计算两个玩家输入每个单词时的用时差
time_per_word函数
参数:
- words:按输入顺序排列的单词列表
- timestamps_per_player:一个二维列表,记录了每位玩家开始与结束输入每个单词的时间
data abstraction:match
参数:
- words:输入单词组成的单词列表
- times:二维列表,记录了每个玩家输入每个单词需要多长时间,如
times[i][j]指玩家i输入单词words[j]所花费的时间
思路:
- timestamps_per_player记录的是每个玩家敲每个单词的开始与结束时间
- 传入match后,时间会自动计算为敲每个单词使用的时间(结束-开始),并存入每个单词key对应的value中
- 通过get_all_word与get_all_times函数可以返回match中对应的键与值形成的列表
- 通过get_word与time函数可以返回指定的键与值
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def time_per_word(words, timestamps_per_player):
"""Given timing data, return a match data abstraction, which contains a
list of words and the amount of time each player took to type each word.
Arguments:
words: a list of words, in the order they are typed.
timestamps_per_player: A list of lists of timestamps including the time
the player started typing, followed by the time
the player finished typing each word.
>>> p = [[75, 81, 84, 90, 92], [19, 29, 35, 36, 38]]
>>> match = time_per_word(['collar', 'plush', 'blush', 'repute'], p)
>>> get_all_words(match)
['collar', 'plush', 'blush', 'repute']
>>> get_all_times(match)
[[6, 3, 6, 2], [10, 6, 1, 2]]
"""
# BEGIN PROBLEM 9
times = []
for l in timestamps_per_player:
each_times = []
for i in range(1,len(l)):
each_times.append(l[i]-l[i-1])
times.append(each_times)
return match(words,times)
# END PROBLEM 9注意:
该函数的作用是通过调用match函数构造该抽象类型
但match函数需要每个单词的时间差,故需要在该函数中处理输入的时间列表(结束-开始)
将处理后的列表与传入该函数的单词列表传入match函数并返回获得的抽象类型
Problem 10
返回每个单词哪位玩家敲得块
fastest_words函数
参数:
- match:match函数中获得的抽象数据类型
思路:
- 比较多个玩家输入每个字符时间,选择输入时间最少的
- 返回的列表中存储了每个玩家输入最快的单词组成的列表
- 若多个玩家输入时间相同,则取编号更小的玩家
code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def fastest_words(match):
"""Return a list of lists of which words each player typed fastest.
Arguments:
match: a match data abstraction as returned by time_per_word.
>>> p0 = [5, 1, 3]
>>> p1 = [4, 1, 6]
>>> fastest_words(match(['Just', 'have', 'fun'], [p0, p1]))
[['have', 'fun'], ['Just']]
>>> p0 # input lists should not be mutated
[5, 1, 3]
>>> p1
[4, 1, 6]
"""
player_indices = range(len(get_all_times(match))) # contains an *index* for each player
word_indices = range(len(get_all_words(match))) # contains an *index* for each word
# BEGIN PROBLEM 10
result = [[] for i in player_indices]
for i in word_indices:
min_player = -1
for j in player_indices:
if min_player == -1 or time(match, j, i) < time(match, min_player, i):
min_player = j
result[min_player].append(get_word(match,i))
return result
# END PROBLEM 10注意:
- 列表中创建多个列表要用for表达式
[[] for _ in player_indices] - result列表中下标表示玩家,对应元素表示该玩家最快的单词组成的列表
- 我们首先遍历单词,然后遍历玩家找出时间最小的玩家,并将对应单词存入该玩家对应列表