2.2 下午5点更新:增加后三问。上午的时候,写一半没动力写了,居然有人看,想着那就写完吧 ,实际上第一天能做完前面三问都不错了,做完再来看后面的思路。
2.2 晚上10点更新:我新发了一篇文章简单实现了第一问,大家可以参考。
2024美赛-C题第一问解析-纯菜鸟个人思路-非机构 - 豆包儿炒饭的文章 - 知乎
豆包儿炒饭:2024美赛-C题第一问解析-纯菜鸟个人思路-非机构
以下是原文:
背景部分
咋一看题目:Problem C: Momentum in Tennis
咋一看图片:
我测,不会C题数据分析成网球物理动能了吧。不太可能,继续看看。In the 2023 Wimbledon Gentlemen’s final, 20-year-old Spanish rising star Carlos Alcaraz defeated 36-year-old Novak Djokovic. The loss was Djokovic’s first at Wimbledon since 2013 and ended a remarkable run for one of the all-time great players in Grand Slams.
背景:年轻人打败了大满贯,非常nb
貌似和这个Momentum无关吧?继续The match itself was a remarkable battle.[1]
好的,接下来说这场比赛。Djokovic seemed destined to win easily as he dominated the first set 6 – 1 (winning 6 of 7 games). The second set, however, was tense and finally won by Alcarez in a tie-breaker 7 – 6. The third set was the reverse of the first, Alcaraz winning handily 6 – 1. The young Spaniard seemed in total control as the fourth set started, but somehow the match again changed course with Djokovic taking complete control to win the set 6 – 3大佬们都在玩{精选官网网址: www.vip333.Co }值得信任的品牌平台!. The fifth and final set started with Djokovic carrying the edge from the fourth set, but again a change of direction occurred and Alcaraz gained control and the victory 6 – 4.
ok,反正就是说比赛你来我往,所以Momentum是指的比赛的势头。The data for this match is in the provided data set, “match_id” of “2023-wimbledon-1701”.
开始谈数据了,后面的内容要结合数据表格来阅读。先把数据放上来看看。
这里读题的时候可以先不急着用代码读,直接excel打开检索就ok。You can see all the points for the first set when Djokovic had the edge using the “set_no” column equal to 1.
先看看set_no这一列里有啥 。
好的,set_no就是代表的第几局.The incredible swings, sometimes for many points or even games, that occurred in the player who seemed to have the advantage are often attributed to “momentum.”
比赛中,谁在气势上占据上风,就会在比赛中占据优势。
好的,到这里才说玩momentum,可恶阿可恶大佬们都在玩{精选官网网址: www.vip333.Co }值得信任的品牌平台!。One dictionary definition of momentum is “strength or force gained by motion or by a series of events.”[2] In sports, a team or player may feel they have the momentum, or “strength/force” during a match/game, but it is difficult to measure such a phenomenon. Further, it is not readily apparent how various events during the match act to create or change momentum if it exists.
好好好,kuku一顿解释,关键就是把握一下两个点:
1.比赛中的势头很难量化;
2.很难研究势头是怎么改变的。
3.最后还有一个“ if it exists”,这里的it指的是Momentum,那证明题目中的势头不一定存在的, 还得说明一下什么时候存在势头。
那你最后写论文的Conclusion的时候就得提一提前两个问题你是怎么解决的。势头是否存在得在前面就说明,不然解题无法开展了。
继续读。Data is provided for every point from all Wimbledon 2023 men’s matches after the first 2 rounds.
前两轮比赛之后的数据。为啥没有前两场啊??You may choose to include additional player information or other data at your discretion, but you must completely document the sources.
好的,可以自己爬。个人观点,说了可以增加那你就一定得增加,爬了数据更加有说服力,也更好获奖。
接下来是重头戏,题目!
第一问Use the data to:• Develop a model that captures the flow of play as points occur and apply it to one ormore of the matches. Your model should identify which player is performing better ata given time in the match, as well as how much better they are performing. Provide a visualization based on your model to depict the match flow. Note: in tennis, the player serving has a much higher probability of winning the point/game. You may wish to factor this into your model in some way.
这个问题有如下几点:只用研究分数变化的时间点,这个非常容易接受,数据集就是这样给我们的模型要应用到比赛中。也很正常,模型做出来以后肯定需要用到比赛中测试一下结果的嘛关键1:Your model should identify which player is performing better at a given time in the match, as well as how much better they are performing. 这不就是Momentum嘛,就是在给出时间点(given time in the match)的时候双方的Momentum. 这个Momentum的量化方式可以是一个创新点。关键2:Provide a visualization based on your model to depict the match flow. 明确说了要画的图必须要重点画,你甚至可以花大功夫,尽量做得独特、漂亮、清晰。关键3:模型必须把是否作为发球方作为考虑因素,你最好在论文里展示一下他的这个特征。
第二问A tennis coach is skeptical that “momentum” plays any role in the match. Instead, he postulates that swings in play and runs of success by one player are random. Use your model/metric to assess this claim.这题是设问,结果已经明确了,你肯定要说这个教练的说法是不对的。用你第一问的模型来说明,比赛的波动和momentum是明确相关的。思路也很简单,这里可以联系到前面题目第三段,我强调过的关键点,具体位置看下面红框:
如果没有Momentum的时候,整个比赛波动性就很小是否就反驳了教练的说法呢?相信聪明的孩子已经明白了。
当然你需要把波动性用一个具体数值指标衡量出来。另外来说,如果你的Momentum是一个连续的数值,你都可以算它和波动性相关系数来说明Momentum的有效性。
第三问Coaches would love to know if there are indicators that can help determine when the flow of play is about to change from favoring one player to the other.Using the data provided for at least one match, develop a model that predicts these swings in the match. What factors seem most related (if any)?Given the differential in past match “momentum” swings how do you advise a player going into a new match against a different player?
累了 不想写了
2.2下午 更新:
接着写,这一问应该是最臭最长的一问了。不过还好都不是很难。
一句一句来:Coaches would love to know if there are indicators that can help determine when the flow of play is about to change from favoring one player to the other.
我简单翻译一下:教练想知道有哪些指标能用来判断势头改变的时机。很好理解嘛,势头要该改变的时候,教练就叫暂停给球员打打气嘛。
大概意思就是要你给出哪些因素是影响比赛势头变化的,好办吧,就是机器学习的特征选择嘛。
像我以前就会干一件非常诡异又合乎逻辑的事情,就是我把机器模型训练好了之后,通过一堆衡量指标已经说明模型性能好了嘛,我就把模型关于每个特征的权重提取出来直接用来衡量哪个指标更关键。这个方法的代码也很简单,如果你用 scikit-learn这个库的话,里面有个weight的属性,你把训练好的模型的这个属性打印出来,或者说画个柱状图放上去就好了。
这里分了两个小小问,刚刚是这两个小小问的总起,我们看看两个小小问分别是啥。
好的,下一句!Using the data provided for at least one match, develop a model that predicts these swings in the match. What factors seem most related (if any)?
诶呀我去,没仔细看,这不是和前面第一问一样的问题么?不就是第一问的预测么,然后又特征取出来?
当然不是的啦,靓仔。
别慌,你看看我这么说,有没有道理。
这个Swings是势头的变化,它这里应该故意没写全,但在下一句写全了,就是Momentum Swings,你自己看看下一句咯。
所以,第一问的模型预测的是势头(Momentum),这一问的模型是要你预测的是势头的改变(Momentum Swings)。当然前面两个单词的意思不是完全对应的,只是基于题目理解的翻译。
所以第三问是在第一问模型的结果上,加上了一步。有势头Momentum的数值后,结合其他的指标,预测势头切换是在哪一次发生的。
好了,如果你做了第一问,你可能会预测一个连续数值的Momentum,而不是离散数值大佬们都在玩{精选官网网址: www.vip333.Co }值得信任的品牌平台!. 也就是说你是双方都有势头,不是在你这,或者在我这,没有切换这样的说法。我的建议是,如果我来做的话,我会直接比大小,大的一方称为拥有势头。
我估计很多朋友会觉得很迷迷,我直接给具体一点的方法。你就是写一个模型,预测势头的变化。Momentum的量化指标大的一方拥有势头,你要给出的结果就是什么时候会改变。但直接预测改变还是不好想。你可以这样做,把一场比赛的势头设置为[0,1,-1],当0的时候说明没有势头这回事,1的时候在势头在一方,-1的时候在另一方。这三个数的具体值,你就是用第一问得到的Momentum 的具体值来双方比大小得到。这样子之后,你的模型就变成一个离散的label预测了,预测出来之后,这个值正负改变的时候就是势头变化的时候呗,相信大家都很有办法啦。What factors seem most related (if any)?
前面说过了,我稍微提一下,这个就是前面说的特征提取,谁相关性大就是谁权重高咯。Given the differential in past match “momentum” swings how do you advise a player going into a new match against a different player?
反正美赛看到advisers就是写作同学大展身手的时候,建模和编程的同学把结果向写作手解释明白,之后让其自己根据参考文献写就好。
还是说一点吧,这个意思是你要建议它面对新对手的收怎么做,你可以从他每次比赛失去势头的时机入手,因为这一题一直都在说势头的切换嘛,你这么提会联系紧密一点。具体建议就是结合上面得到的影响势头切换的指标说。
第四问Test the model you developed on one or more of the other matches. How well do you predict the swings in the match? If the model performs poorly at times, can you identify any factors that might need to be included in future models? How generalizable is your model to other matches (such as Women’s matches), tournaments, court surfaces, and other sports such as table tennis.
好的,第三问非常夸张,一般来说,接下来两问就是洒洒水啦也不一定哈,我是说一般来说
第四问:Test the model you developed on one or more of the other matches. How well do you predict the swings in the match?
要我们展示结果,秒了,把前面模型结果秀出来,不管牛不牛,你就说自己牛。图用心点画就好,我看看给两张我自己前几次比赛的图。
很好,论文找不到了,算了。If the model performs poorly at times, can you identify any factors that might need to be included in future models?
咋一看还不知道要建啥模, 再一看还真不用建。大概意思就是模型有哪些有潜力的特征可以增加,这个如果我来,我就当开放性问题查文献编故事解决了,不过最好对着前面特征选择的结果说一下。How generalizable is your model to other matches (such as Women’s matches), tournaments, court surfaces, and other sports such as table tennis.
要说明模型的迁移能力,或者说在其他任务的泛化能力,如果做过前两年题目的同学一定非常觉得熟悉吧。二手帆船价格预测,之后迁移到香港市场。好家伙,去年Y题的一问非常类似,大家可以去看看那年优秀论文怎么做的。我自己是用了一些统计学里的方法,像转移熵这些。不过大家自己决定就好。
这里还有很关键的一点要说,这题的内容应该就是在讨论的部分写的了,但是呢,讨论部分有个非常重要的东西就是灵敏度分析,大家一定要记得做。
第五问Produce a report of no more than 25 pages with your findings and include a one- to two-page memo summarizing your results with advice for coaches on the role of “momentum”, and how to prepare players to respond to events that impact the flow of play during a tennis match.
每年固定题目,就强调一点,两页的memo最好画几个好看的图上去。
最后说一下,一定要坚持做完哦,就算用最简单的算法,你只要图画好,最后好好写摘要也可以拿奖的哦。另外到时候摘要一定要找有经验的大佬帮忙修一修,实测找老师效果一般。