如何在最接近的时间段内选择前280个单词？

我需要从较长的单词中提取指定数量单词的较短段文本。我可以这样做

text = "There was a very big cat that was sitting on the ledge. It was overlooking the garden. The dog next door watched with curiosity." text.split[0..15].join(' ') >>""There was a very big cat that was sitting on the ledge. It was overlooking"

我想选择下一期的文本，所以我最终不会得到部分句子。

是否有一种方法可能使用正则表达式来完成我正在尝试做的事情，这将能够使文本达到并包括在第15个单词之后最接近的下一个时期？

您可以使用

 (?:\w+[,.?!]?\s+){14}(?:\w+,?\s+)*?\w+[.?!]

重复一个单词，可选[逗号/句号/问号/感叹号]和空格，共14次。然后，它懒惰地重复一个单词后跟一个空格，然后是另一个单词和一个句点，确保该模式在从开始的15个单词后的第一个句点结束。

https://regex101.com/r/ardIQ7/4

 r = / (?: # begin a non-capture group \p{Alpha}+ # match one or more letters [.!?]? # optionally ('?' following ']') match one of the 3 punctuation chars [ ]+ # match one or more spaces ) # end non-capture group {14,}? # execute the preceding non-capture group at least 14 times, lazily ('?') \p{Alpha}+ # match one or more letters [.!?] # match one of the three punctuation characters /x # free-spacing regex definition mode text[r] #=> "There was a very big cat that was sitting on the ledge. It was overlooking # the garden.

自由间隔模式剥离空格，这就是上面的空格字符在字符类（ [ ]+ ）中的原因。按惯例，正则表达式如下。

 /(?:\p{Alpha}+[.!?]? +){14,}?\p{Alpha}+[.!?]/

你可以沿着这些方向做点什么：

 text = "There was a very big cat that was sitting on the ledge. It was overlooking the garden. The dog next door watched with curiosity." tgt=15 old_text=text.scan(/[^.]+\.\s?/) new_text=[] while (old_text && new_text.join.scan(/\b\p{Alpha}+\b/).length<=tgt) do new_text << old_text.shift end p new_text.join

打印：

 "There was a very big cat that was sitting on the ledge. It was overlooking the garden. "

这适用于任何长度的普通句子，并且一旦另外一个句子超过单词目标就会中断。

如何在最接近的时间段内选择前280个单词？

我应该在回滚后删除迁移吗？

Ruby守护进程使瞬态Ruby实例的对象保持活动状态

Ruby – Array.find，但返回块的值

如何使用Google-Maps-for-Rails使信息窗口自动显示为打开状态

Ruby中＃{}的含义？

在两个rails4应用程序之间共享会话

堆栈级别太深，尝试从3.0更新到3.1

如何自动停止chromedriver打开设置选项卡？

Ruby / Rails中的Zipcode，to_i和前导零

如何使用Watir :: Waiter :: wait_until强制Chrome等待？