-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Textmodel integration #250
base: master
Are you sure you want to change the base?
Conversation
I think you've readded some files from TextModels.jl that we don't need, could you remove those? 🙂 |
Sure! Will clean up in the next commit. |
julia> batches = FastText.load_batchseq(data, task)
julia> batches[1][1]
92-element Vector{Vector{Int64}}:
[25000, 25000, 25000, 25000, 25000, 25000, 25000, 25000]
[633779, 633779, 633779, 633779, 633779, 633779, 633779, 633779]
[2731, 34, 315, 354, 2087, 2209, 70, 1307]
[44047, 435, 633779, 633779, 6589, 633779, 633779, 205]
⋮
[0, 0, 0, 0, 0, 213, 0, 0]
[0, 0, 0, 0, 0, 25, 0, 0]
[0, 0, 0, 0, 0, 1778, 0, 0]
julia> batches[1][2]
8-element Vector{Int64}:
1
1
1
1
1
0
1
1 |
Next:
|
Should the vocab CSV files be checked in? I would've assumed they would be artifacts or DataDeps as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting the TextModels code is going to be a big task, but for now the priority is just to make things work e2e :)
FastText/src/models/pretrain_lm.jl
Outdated
Chain( | ||
de, | ||
VarDrop(in_drop_prob), | ||
AWD_LSTM(embedding_size, hid_lstm_sz, hid_drop_prob; init = (dims...) -> init_weights(1/hid_lstm_sz, dims...)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I think this repeated lambda should be pulled out into a helper function of some sort.
FastText/src/models/custom_layers.jl
Outdated
function asgd_step!(iter::Integer, layer::AWD_LSTM) | ||
if (iter >= layer.T) & (layer.T > 0) | ||
p = get_trainable_params([layer]) | ||
avg_fact = 1/max(iter - layer.T + 1, 1) | ||
if avg_fact != 1 | ||
layer.accum = layer.accum .+ p | ||
for (ps, accum) in zip(p, layer.accum) | ||
ps .= avg_fact*accum | ||
end | ||
else | ||
layer.accum = deepcopy(p) # Accumulator for ASGD | ||
end | ||
end | ||
return | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lorenzoh I don't believe we have any utils for this in Flux or FastAI, right? Haven't kept up with the list of callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think so. This also seems very model-specific so it’s maybe best to keep it as part of the model.
Co-authored-by: Brian Chen <[email protected]>
Co-authored-by: Brian Chen <[email protected]>
Co-authored-by: Brian Chen <[email protected]>
Co-authored-by: Brian Chen <[email protected]>
Minor changes as suggested by @ToucheSir
julia> data, blocks = load(datarecipes()["imdb"])
((mapobs(loadfile, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64})), mapobs(parentname, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64}))), (Paragraph(), Label{String}(["neg", "pos"])))
julia> task = TextClassificationSingle(blocks, data)
SupervisedTask(Paragraph -> Label{String})
julia> model = FastAI.taskmodel(task, FastText.LanguageModel(false, task))
#90 (generic function with 1 method)
julia> batches = FastText.load_batchseq(data, task)
6250-element Vector{Tuple{Vector{Vector{Int64}}, WARNING: both Losses and NNlib export "ctc_loss"; uses of it in module Flux must be qualified
Flux.OneHotArray{UInt32, 2, 1, 2, Vector{UInt32}}}}:
([[35, 35, 35, 35], [3, 3, 3, 9], [40, 18025, 15, 14], [224, 10, 3541, 3040], [737, 34, 24, 505], [49, 7, 809, 3], [4, 4, 221, 3836], [1927, 104, 4,
3], [7, 16, 629, 28440], [6, 351, 7, 17] … [2, 2, 2, 44], [2, 2, 2, 3], [2, 2, 2, 9839], [2, 2, 2, 17], [2, 2, 2, 1041], [2, 2, 2, 27], [2, 2, 2, 3], [2, 2, 2, 3836], [2, 2, 2, 3], [2, 2, 2, 28440]], [0 0 1 1; 1 1 0 0])
julia> using FluxTraining
julia> td, vd = splitobs(batches, at=0.9)
julia> using Flux
julia> learner = Learner(model, Flux.Losses.logitcrossentropy, callbacks=[Metrics(accuracy)]; data=(td, vd))
Learner()
julia> fit!(learner, 1)
Epoch 1 TrainingPhase(): 0%|█ | ETA: 4 days, 3:35:31 |
Change `AWD_LSTM` to `WeightDroppedLSTM`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewing the weight dropped LSTM for now
h::S | ||
c::S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are now replaced by state0
and stored in Recur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is still outstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More changes from the call
true, | ||
(Flux.zeros32(out, 1), Flux.zeros32(out, 1)) | ||
) | ||
cell.b[gate(out, 2)] .= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it only mask part of the bias @Chandu-4444?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is ported from TextModels.jl I'm not completely sure of this. Even Flux.jl has this line this.
Would be helpful if someone could explain the reason.
testmode!(m::DroppedEmbeddings, mode=true) = | ||
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m) | ||
|
||
function reset_masks!(de::DroppedEmbeddings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overload Flux.reset!
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumping this comment
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
FastText/src/models/custom_layers.jl
Outdated
mutable struct VarDrop{F} | ||
p::F | ||
mask | ||
reset::Bool | ||
active::Bool | ||
end | ||
|
||
VarDrop(p::Float64=0.0) = VarDrop(p, Array{Float32, 2}(UndefInitializer(), 0, 0), true, true) | ||
|
||
function (vd::VarDrop)(x) | ||
vd.active || return x | ||
if vd.reset | ||
vd.mask = drop_mask(x, vd.p) | ||
vd.reset = false | ||
end | ||
return (x .* vd.mask) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As promised, here's the proposal from our call today:
mutable struct VarDropCell{F}
p::F
active::Union{Bool, Nothing} # matches other norm layers
end
@functor VarDropCell
VarDropCell(p::Real=0.0) = VarDropCell(p, nothing)
function (vd::VarDropCell)((has_mask, mask), x)
if Flux._isactive(vd)
mask = has_mask ? mask : drop_mask(x, vd.p)
return (true, mask), x .* mask
elseif !has_mask
return (has_mask, mask), x
else
error("Mask set but layer is in test mode. Call `reset!` to clear the mask.")
end
end
# The single-element array keeps Recur happy.
# Limitation: typeof(p) must == typeof(<inputs>)
VarDrop(p::Real) = Recur(VarDropCell(p), ones(typeof(p), 1, 1))
Plus bits to make testmode!
, _isactive
and reset!
work properly.
Co-authored-by: Kyle Daruwalla <[email protected]>
at `DroppedEmbeddings`.
The changes have been merged to #258. |
`TextClassifier`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this
h::S | ||
c::S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is still outstanding.
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
FastText/src/models/custom_layers.jl
Outdated
function asgd_step!(iter::Integer, layer::AWD_LSTM) | ||
if (iter >= layer.T) & (layer.T > 0) | ||
p = get_trainable_params([layer]) | ||
avg_fact = 1/max(iter - layer.T + 1, 1) | ||
if avg_fact != 1 | ||
layer.accum = layer.accum .+ p | ||
for (ps, accum) in zip(p, layer.accum) | ||
ps .= avg_fact*accum | ||
end | ||
else | ||
layer.accum = deepcopy(p) # Accumulator for ASGD | ||
end | ||
end | ||
return | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think so. This also seems very model-specific so it’s maybe best to keep it as part of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the issues that you posted on Zulip should be resolved by this
FastText/src/models/custom_layers.jl
Outdated
maskWi = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wi, layer.cell.p) | ||
maskWh = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wh, layer.cell.p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maskWi = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wi, layer.cell.p) | |
maskWh = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wh, layer.cell.p) | |
maskWi = Flux.dropout_mask(Flux.rng_from_array(layer.cell.Wi), layer.cell.Wi, layer.cell.p) | |
maskWh = Flux.dropout_mask(Flux.rng_from_array(layer.cell.Wh), layer.cell.Wh, layer.cell.p) |
testmode!(m::DroppedEmbeddings, mode=true) = | ||
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m) | ||
|
||
function reset_masks!(de::DroppedEmbeddings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumping this comment
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
at `Flux.rng_from_array`
Can do the following: