Rails 3用户匹配算法到SQL查询(COMPLICATED)

我目前正在开发一款基于已回答问题与用户匹配的应用。 我在普通的RoR和ActiveRecord查询中实现了我的算法,但是使用它的速度很慢。 将一个用户与其他100个用户匹配

Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms) 

在我的本地机器上。 但仍然……我现在想在原始SQL中实现这一点,以获得更多性能。 但我真的很难理解SQL查询中的SQL查询以及类似这样的事情以及计算等等。我的头脑即将爆炸,我甚至不知道从哪里开始。

这是我的算法:

 def match(user) @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100 @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100 if self.common_questions(user) == [] 0.to_f else match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end def possible_score(user) i = 0 self.user_questions.select("question_id, importance").find_each do |n| if user.user_questions.select(:id).find_by_question_id(n.question_id) i += Importance.find_by_id(n.importance).value end end return i end def actual_score(user) i = 0 self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n| @user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id) unless @user_answer == nil if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id) i += Importance.find_by_id(n.importance).value end end end return i end 

所以基本上用户回答问题,选择他接受的答案以及这个问题对他有多重要。 该算法然后检查2个用户共同的问题,如果user1给出了用户2接受的答案,如果是,则添加用户2给出的每个问题的重要性,其构成得分用户1。 对于user2来说也是另一种方式。 除以可能的分数给出百分比,并且应用于几何平均值的两个百分比给出两个用户的总匹配百分比。 我知道相当复杂。 告诉我,如果我没有解释它足够好。 我希望我能在原始SQL中表达这一点。 性能就是这一切。

这是我的数据库表:

 CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded) CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime); CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer); 

我想SQL查询的顶部必须看起来像这样?

 SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match FROM 

但是因为我不是一个SQL大师,只能做通常的事情,我的脑袋即将爆炸。 我希望有人可以帮我解决这个问题。 或者至少以某种方式改善我的表现! 非常感谢!

编辑:

所以根据Wizard的回答,我设法为“possible_score”获得了一个很好的SQL语句

 SELECT SUM(value) AS sum_id FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101 WHERE uq1.user_id = 1 

我试图用这个获得“actual_score”,但它没有用。 我执行此操作时,我的数据库管理器崩溃了。

 SELECT SUM(imp.value) AS sum_id FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101 INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id WHERE uq1.user_id = 1 

EDIT2

好吧,我是个白痴! 当然,我不能使用“as”作为别名。 将其更改为aa并且有效! W00T!

我知道你正在考虑转向SQL解决方案,但是可以对Ruby代码进行一些重大的性能改进,这可能会消除使用手工编写SQL的需要。 在优化代码时,通常需要使用分析器来确保您确实知道哪些部分是问题所在。 在您的示例中,我认为可以通过删除在每次迭代期间执行的迭代代码和数据库查询来进行一些重大改进!

此外,如果您使用的是最新版本的ActiveRecord,则可以使用子选择生成查询,而无需对任何SQL进行编码。 当然,为数据库创建适当的索引非常重要。

我根据我从代码中推断出的内容,对你的模型和关系做了很多假设。 如果我错了,请告诉我,我会尝试做出相应的调整。

 def match(user) if self.common_questions(user) == [] 0.to_f else # Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed. @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100 @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100 match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end def possible_score(user) # If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance. # Ie UserQuestion belongs_to :importance, and Importance has_many :user_questions. # I'm assuming that user_questions represents join models between users and questions. # Ie User has_many :user_questions, and User has_many :questions, :through => :user_questions. # Question has_many :user_questions, and Question has_many :users, :through => :user_questions # From your code this seems like the logical setup. Let me know if my assumption is wrong. self.user_questions. joins(:importance). # Requires the relation between UserQuestion and Importance I described above where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord sum(:value) # I'm also assuming that the importances table has a `value` column. end def actual_score(user) user_questions. joins(:importance, :accepted_answers). # It looks like accepted_answers indicates an answers table where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)). sum(:value) end 

UserQuestion似乎是User,Question,Answer和Importance之间的超级连接模型。 以下是与代码相关的模型关系(不包括has_many:通过您可以创建的关系)。 我想你可能已经有了这些:

 # User has_many :user_questions # UserQuestion belongs_to :user belongs_to :question belongs_to :importance, :foreign_key => :importance # Maybe rename the column `importance` to `importance_id` belongs_to :answer # Question has_many :user_questions # Importance has_many :user_questions # Answer has_many :user_questions 

所以这是我的新匹配function。 我还无法将所有内容放在一个查询中,因为SQLite不支持数学函数。 但是当我切换到MySQL时,我会将所有内容都放在一个查询中。 所有这些都给了我巨大的性能提升:

 Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms) 

将一个用户与其他100个用户匹配。 非常好! 一旦我用10k假用户填充我的数据库,我将不得不看看它有多好。 还有额外的赞誉“向导的奥格斯”指出我的低效代码!

编辑:

尝试过只有1000个用户,每个用户问题10到100个,以及……

 Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms) 

……男孩做了那么久! 我将不得不考虑解决这个问题。

 def match(user) if self.common_questions(user) == [] 0.to_f else @a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match FROM (SELECT SUM(imp.value) AS actual_score FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ? INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ? WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f @b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match FROM (SELECT SUM(imp.value) AS actual_score FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ? INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ? WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end